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The  objective  of  this  study  is  to  develop  new  fast  algorithms  for  use  in  image 
decompression  and  to  show  that  the  developed  algorithms  outperform  traditional 
algorithms  in  computational  complexity.  Specifically,  this  study  focuses  on  a coding 
scheme  called  progressive  image  transmission  (PIT).  Using  such  a scheme,  images 
can  be  reconstructed  gradually  at  the  receiver  through  an  inverse  PIT  process. 
Therefore,  it  is  particularly  useful  for  image  telebrowsing  systems. 

The  dissertation  is  presented  as  follows.  First,  a general  framework  by  which 
fast  inverse  PIT  can  be  developed  is  proposed.  Second,  based  on  the  framework,  fast 
inverse  discrete  cosine  transform  and  inverse  discrete  Fourier  transform  algorithms 
are  developed.  Finally,  the  computational  complexity  of  the  developed  algorithms  are 
compared  with  that  of  the  traditional  algorithms  through  analytical  and  simulation 
studies. 


VI 


The  analysis  indicates  that  the  developed  algorithms  have  lower  computational 
complexity  when  compared  to  traditional  algorithms.  The  main  reasons  are  that  the 
developed  algorithms  (1)  exploit  the  feature  that  for  most  images  the  input  matrices 
to  an  inverse  transform  are  very  sparse,  and  (2)  make  use  of  the  symmetrical 
property  of  the  basis  functions  of  the  transform.  The  algorithms  also  have  the 
following  features:  (1)  They  allow  transformed  coefficients  be  transmitted  in  any 
order,  including  the  zig-zag  scanning;  (2)  They  have  essentially  no  delay  time;  i.e.,  the 
image  reconstruction  can  be  initiated  as  soon  as  the  first  coefficient  is  received;  and 
(3)  The  images  can  be  reconstructed  progressively  with  any  degree  of  smoothness. 
These  features  make  the  algorithms  particularly  suitable  for  use  in  a runlength  based 
entropy  coding  scheme  such  as  the  one  employed  in  the  JPEG  standard. 
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CHAPTER  I 
INTRODUCTION 


1.1  Background  and  Motivation 

Today  the  typical  size  of  a digital  image  file  can  range  from  several  million 
bits  to  more  than  a hundred  million  bits  due  to  the  successful  development  of  high 
resolution  image  capture  and  display  devices.  The  transmission  of  these  images  is, 
however,  time  consuming  under  the  current  technology  in  both  transmission 
hardwares  and  softwares,  such  as  image  coding  algorithms.  Even  with  the  use  of  the 
latest  technology  such  as  broadband  integrated  services  digital  network  (BISDN), 
which  adopts  fiber  transmission  media,  it  is  still  not  possible  to  handle  such  an 
amount  of  image  data  without  degrading  its  services.  As  a result,  network  users  may, 
for  example,  receive  partial  images  with  bad  quality  or  receive  complete  images  with 
unacceptable  delay  time  due  to  network  congestion.  The  problem  would  become 
even  more  serious  when  the  transmission  is  operated  under  a low  bit-rate  channel. 
With  the  rapid  development  in  transmission  hardwares  and  image  coding  (including 
compression  and  decompression)  techniques,  it  is  hoped  that  in  the  near  future  the 
transmission  of  such  amount  of  image  data  can  be  as  efficient  as  sending  an  E-mail 
today  in  a public  computer  network  (Seitz  and  Lang  [1990]). 

To  illustrate  the  problem,  consider  ten  8-bit  512  x 512  grayscale  images.  The 
total  amount  of  image  data  to  be  transmitted  is  2,621,440  bytes.  If  color  images  are 
considered,  the  amount  of  data  is  even  larger.  If  the  images  are  transmitted  in  an 
effective  rate  of  2,400  bits/sec,  more  than  8,738  seconds  or  2 hours,  25  minutes  and 
38  seconds  are  required  to  send  the  images.  Over  2 hours  of  transmission  time  is  not 
acceptable  in  most  applications  such  as  image  telebrowsing  systems. 
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One  important  evaluation  criterion  for  the  performance  of  a telebrowsing 
system  is  its  response  time,  defined  as  the  time  elapsed  from  the  moment  a retrieval 
request  is  issued  until  the  desired  information  is  actually  displayed  on  the  monitor 
(Degan  et  al.  [1990]).  The  response  time  is  spent  mainly  on  three  major  operations: 
searching  for  the  desired  information,  transmitting  the  information  through  a 
channel,  and  displaying  information  on  the  monitor.  Early  studies  on  the  telebrowsing 
systems  concentrated  on  the  efficient  retrieval  of  pure  text  information  (Tou  et  al. 
[1976],  Tou  and  Depree  [1978]).  In  such  cases,  only  searching  time  is  of  concern 
because  the  contribution  of  transmission  and  display  time  to  the  total  response  time 
is  usually  insignificant.  However,  for  modern  telebrowsing  systems,  where  multimedia 
information,  including  text,  audio,  image,  and  video  information,  is  considered,  the 
transmission  and  display  time  becomes  a significant  portion  of  the  response  time 
because  of  the  huge  amount  of  data  involved  in  the  process. 

The  response  time  of  an  image  telebrowsing  system  should  be  sufficiently 
short  in  real  world  applications.  The  transmission  rate  can  be  increased  so  that  the 
response  time  is  reduced  to  a desired  level.  However,  this  usually  requires  a 
significant  increase  in  the  transmission  cost.  A more  economic  approach  is  to 
compress  the  image  data  before  transmission,  and  to  transmit  only  part  of  the 
compressed  data  needed  for  telebrowsing  images.  The  idea  leads  to  a special  coding 
scheme  called  progressive  image  transmission  (PIT).  The  progressive  coding  scheme 
allows  an  approximate  reconstruction  of  an  image.  The  fidelity  of  the  image  is  built 
up  gradually  until  the  viewer  decides  either  to  abort  the  transmission  sequence  or  to 
allow  for  further  reconstruction.  Besides  the  intended  application  of  PIT  in  image 
telebrowsing  systems,  it  can  also  be  applied  to  other  applications,  including 
teleconferencing,  medical  diagnostic  imaging,  videotex,  security  services,  and 
electronic  shopping  in  mail  order  (Frank  et  al.  [1980],  Hill  et  al.  [1983],  Elnahas  et 
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al.  [1986],  Tzou  [1987],  Mongatti  and  Alparone  [1989],  Chitprasert  and  Rao  [1990], 
Goldberg  and  Wang  [1991]). 

With  the  PIT  techniques,  transmission  time  can  be  greatly  reduced.  However, 
this  also  increases  the  display  time  because  an  inverse  PIT  process  is  required.  In 
general,  however,  the  increase  in  display  time  is  well  compensated  by  the  decrease 
in  transmission  time  due  to  the  use  of  the  PIT  techniques.  Nevertheless,  if  dedicated 
hardwares  for  the  inverse  PIT  process  are  not  available,  the  contribution  of  display 
time  to  the  overall  response  time  can  still  be  significant.  Therefore,  it  is  desirable  to 
find  a way  to  reduce  the  display  time  in  the  inverse  PIT  process.  Since  the  major  task 
of  the  inverse  PIT  process  is  in  the  decompression  of  images,  there  is  a need  to  have 
fast  image  decompression  schemes  that  are  suitable  for  the  inverse  PIT  process. 

A complete  image  coding  scheme  includes  a compression  scheme  and  a 
decompression  scheme.  The  decompression  scheme  used  at  a receiver  must  be 
compatible  with  the  compression  scheme  used  at  a transmitter.  By  compatible  we 
mean  that  the  decompression  scheme  should  be  an  exact  inverse  process  of  the 
compression  scheme.  Many  image  coding  schemes  have  been  proposed  in  the  past. 
The  performance  of  these  coding  schemes  are  usually  evaluated  using  the  following 
three  criteria;  (1)  the  ability  for  compression  schemes  to  compress  the  image  data, 
(2)  the  quality  for  the  decompression  scheme  to  reconstruct  the  image,  and  (3)  the 
computational  complexity  for  the  compression  and  decompression  schemes  to 
compress  image  data  and  reconstruct  the  image.  The  ability  to  compress  the  image 
data  is  usually  measured  by  an  index  called  compression  ratio.  To  illustrate,  if  an 
uncompressed  image  requires  8 bits  to  represent  a pixel  (8  bits  per  pbcel  or  simply 
8 bpp),  but  after  the  compression  it  requires  only  1 bit  to  represent  a pixel  (1  bpp), 
then  the  compression  ratio  is  8 to  1 (or  8:1).  The  quality  of  a reconstructed  image 
can  be  evaluated  either  subjectively  or  objectively.  Most  subjective  criteria  are  based 
on  whether  the  reconstructed  image  is  pleasant  to  watch  by  human  eyes.  One 
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commonly  used  objective  criterion  is  the  "signal  to  noise  ratio"  of  the  reconstructed 
image.  The  evaluation  results  based  on  a subjective  criterion  may  be  different  from 
those  based  on  an  objective  criterion.  As  to  the  computational  complexity,  the  total 
number  of  additions  and  multiplications  divided  by  the  total  number  of  process  pixels 
is  one  commonly  used  quantitative  measure  for  the  image  coding  schemes.  A good 
coding  scheme  is  the  one  that  has  high  compression  ratio,  good  image  quality,  and 
low  computational  complexity. 

To  date,  there  is  no  particular  image  coding  scheme  which  can  be  considered 
to  be  the  best  under  all  three  criteria  described  above.  However,  it  is  widely  accepted 
that  the  coding  scheme  based  on  the  two-dimensional  discrete  cosine  transform  (2-D 
DCT)  can  perform  relatively  well  under  all  these  criteria  when  compared  to  other 
schemes.  The  principle  behind  the  2-D  DCT  coding  is  that  after  the  transform  the 
signal  energy  would  tend  to  cluster  in  the  low  frequency  region.  Therefore,  using  a 
few  transform  coefficients  one  can  make  a reasonable  reconstruction  of  the  original 
image,  and  by  using  more  coefficients  one  can  refine  the  reconstructed  image.  This 
feature  makes  the  2-D  DCT  extremely  attractive  for  the  PIT  (Tzou  [1987]).  That  is 
also  part  of  the  reason  why  the  DCT-based  approach  is  chosen  as  the  first 
international  digital  image  compression  standard  for  continuous-tone  (multilevel)  still 
images,  both  grayscale  and  color  images  (Leger  et  al.  [1991],  Wallace  [1991], 
Gonzalez  and  Woods  [1992],  Wallace[1992]).  The  standard  is  known  by  the  acronym 
JPEG,  for  Joint  Photographic  Experts  Group,  which  is  the  name  of  a joint  ISO 
(International  Standards  Organization)  and  CCITT  (Consultative  Committee  for 
International  Telegraph  and  Telephone)  committee.  A summary  of  the  JPEG  coding 
scheme  is  included  in  Appends  A.  It  is  expected  that  the  JPEG  standard  will  bring 
a significant  impact  on  the  image-coding  industry  in  the  future.  However,  as  far  as 
implementation  of  the  standard  is  concerned,  the  standard  provides  only  a guideline. 
How  to  implement  the  standard  efficiently  for  certain  applications  still  relies  on  the 
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ingenuity  of  designers.  For  example,  to  preserve  freedom  for  innovation  and 
customization  within  implementations,  the  JPEG  has  chosen  not  to  specify  either  a 
unique  forward  DCT  (FDCT)  algorithm  or  a unique  inverse  DCT  (IDCT)  in  its 
proposed  standard.  This  is  also  because  the  JPEG  recognizes  that  the  research  in  fast 
DCT  algorithms  is  still  ongoing  and  no  single  algorithm  is  considered  optimal  for  all 
implementations  (Wallace  [1992]). 


1.2  Objectives 

The  following  two  problems  are  studied  in  this  dissertation.  First,  is  there  a 
better  way  to  perform  IDCT  than  traditional  fast  IDCT  in  the  inverse  PIT  process? 
Second,  can  we  develop  fast  inverse  transforms  for  non-DCT  transforms  in  the  same 
way?  That  is,  is  there  a general  theory  for  developing  fast  inverse  transforms  in  the 
inverse  PIT  process? 

The  goal  of  this  dissertation  is  to  develop  new  fast  algorithms  for  use  in 
inverse  PIT,  and  to  show  that  the  performances  of  the  proposed  algorithms 
computationally  outperform  the  traditional  fast  inverse  transforms. 

The  specific  objectives  of  this  study  are  as  follows: 

1.  Propose  a general  framework  by  which  fast  inverse  transforms  can  be 
developed  for  the  use  in  inverse  PIT  process. 

2.  Develop  new  fast  IDCT  and  IDFT  (inverse  Discrete  Fourier  transform) 
algorithms  based  on  the  proposed  framework. 

3.  Compare  the  computational  complexities  of  the  proposed  fast  IDCT  and 
IDFT  algorithms  analytically  and  experimentally  with  their  traditional 
counterparts. 


1.3  Scope  of  This  Dissertation 

To  provide  an  overall  picture  of  the  image  transform  coding  system  of  interest 
in  this  study,  a block  diagram  of  the  system  is  presented  in  Fig.  1.1.  This  is  a general 
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A Data  Block  X Transform-Based  Encoder 
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Compressed 

Data 


Image  Data 


Fig.  1.1  The  block  diagram  of  a transform-based  coder. 
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block  diagram  for  most  transform-based  image  coding  schemes  including  the  JPEG 
coding  schemes. 

Fig.  1.1  illustrates  the  transmission  of  a single-component  (grayscale)  image. 
Color  image  transmission  can  approximately  be  regarded  as  transmission  of  multiple 
grayscale  images  (Wallace  [1991],  Wallace  [1992]).  At  the  input  to  the  encoder, 
source  image  data  are  grouped  into  blocks  with  the  same  size  of  M x N.  Usually, 
M = N=2P  in  practical  applications,  where  p is  a positive  integer.  Each  M x N data 
block  X is  processed  by  a two-dimensional  transform.  In  this  dissertation,  the  two- 
dimensional  transform  is  assumed  to  be  separable.  That  is,  the  two  dimensional 
transform  can  be  performed  by  a series  of  one  dimensional  transforms.  In  Fig.  1.1, 
§ and  T are  the  transform  matrices  with  the  sizes  of  MxM  and  NxN,  respectively.  The 
conjugate  of  # and  the  transpose  of  Y are  defined  as  §*  and  T* , respectively.  It  is 
also  assumed  that  the  transforms  discussed  in  this  dissertation  are  unitary.  In  other 
words,  = and  T ‘T  ‘ = where  Ip^^p  is  a P x P identity  matrix.  The 
mathematical  detail  of  these  transforms  will  be  given  in  Chapter  IV. 

The  output  of  the  forward  transform  is  the  transform  coefficients  represented 
as  Y in  Fig.  1.1.  For  each  data  block  there  are  MN  transform  coefficients  in  total. 
Each  of  the  MN  coefficients  is  then  quantized.  The  purpose  of  the  quantization  is  to 
achieve  further  compression  by  representing  transform  coefficients  with  no  greater 
precision  than  is  necessary  to  achieve  the  desired  image  quality.  For  example,  each 
coefficient  is  originally  represented  by  24  bits  and  now  only  8 bits  are  used.  In  this 
case,  many  coefficients  that  are  different  when  represented  in  24  bits  may  become 
the  same  when  only  8-bit  precision  is  used.  Furthermore,  very  small  coefficients  may 
become  zero  after  quantization.  Therefore,  quantization  is  basically  a many-to-one 
mapping,  and  is  fundamentally  lossy.  The  quantized  coefficients  are  represented  as 
Y’  in  Fig.  1.1. 
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After  quantization,  all  of  the  quantized  coefficients  are  ordered  into  a 
particular  sequence.  This  ordering  helps  to  facilitate  the  runlength-based  entropy 
coding  by  placing  low-frequency  coefficients  (which  are  more  likely  to  be  nonzero) 
before  high-frequency  coefficients  (which  are  more  likely  to  be  zero).  The  coefficient 
sequence  is  further  divided  into  subsequences.  The  format  of  each  subsequence  is 
’0...0b’  or  ’b’,  where  ’0’  and  ’b’  represent  a zero  and  a nonzero  coefficient, 
respectively.  In  coding  terminology,  different  subsequences  can  be  represented  by 
different  symbols.  In  addition,  each  symbol  needs  to  be  assigned  a unique  codeword 
consisting  of  a sequence  of  O’s  and  Ts  so  that  it  can  be  uniquely  decoded  by  the 
decoder.  Since  some  symbols  are  more  likely  to  be  sent  than  others,  assigning  shorter 
codewords  to  more  probable  symbols  and  longer  codewords  to  less  likely  symbols  can 
reduce  the  average  bit  rate.  This  is  called  runlength-based  variable-length  codeword 
assignment  or  simply  runlength-based  entropy  coding. 

The  compressed  image  data  consist  of  a series  of  codewords  produced  by 
entropy  encoder.  At  the  decoder  side,  entropy  decoder  is  simply  a reverse  process  of 
the  entropy  encoder  at  the  encoder  side.  Similarly,  the  dequantizer  is  simply  a 
reverse  process  of  the  quantizer.  Note  that  the  only  information  loss  of  the  original 
source  image  data  comes  from  the  quantizer.  The  entropy  encoder,  entropy  decoder, 
and  dequantizer  introduce  no  loss  of  the  information.  Thus,  the  output  of  the 
dequantizer  is  the  same  as  the  output  of  the  quantizer.  Finally,  at  the  output  from 
the  decoder,  the  inverse  transform  produces  an  M x N data  block  denoted  as  X’  to 
form  one  block  of  the  reconstructed  image.  In  general,  X’  will  not  be  the  same  as  the 
original  data  X.  It  can  be  regarded  as  an  approximation  of  X. 

This  dissertation  focuses  mainly  on  the  fast  inverse  transforms  at  the  decoder 
side.  The  rest  parts  of  the  block  diagram  in  Fig.  1.1  are  out  of  the  scope  of  this 
dissertation.  They  will  be  discussed  only  when  necessary. 
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1.4  Dissertation  Organization 

Following  this  chapter,  Chapter  II  reviews  the  literature  on  non-progressive 
and  progressive  image  coding  techniques.  Chapter  III  describes  the  mathematics  and 
general  theory  related  to  progressive  transform  codings.  Chapter  IV  introduces 
principles  of  fast  and  smooth  progressive  inverse  transforms.  In  Chapter  V,  the  theory 
and  principles  discussed  in  Chapter  III  and  Chapter  IV  are  used  to  develop  fast 
inverse  transform  algorithms  for  progressive  reconstruction  of  transformed  images. 
Chapter  VI  presents  some  simulation  results  to  illustrate  the  use  of  the  developed 
algorithms  and  to  compare  the  performance  of  the  developed  algorithms  and  other 
fast  algorithms.  Chapter  VII  summarizes  this  study  and  suggests  directions  for  future 
research. 


CHAPTER  II 
LITERATURE  REVIEW 


2.1  Introduction 

The  review  is  organized  as  follows.  First,  an  overview  of  image  data 
compression  and  its  techniques  is  given.  Second,  the  research  on  the  PIT  (progressive 
image  transmission)  is  discussed,  and  is  followed  by  a short  review  of  the 
development  of  fast  PIT.  Finally,  the  general  approach  taken  by  this  dissertation  is 
outlined. 


2.2  Overview  of  Image  Data  Compression 
Image  data  compression  has  been  a research  area  for  more  than  two  decades. 
Recently,  this  field  has  grown  tremendously.  New  communication  systems  such  as 
integrated  services  digital  network  (ISDN),  digital  high-definition  television  (HDTV), 
videoteleconferencing,  videophone,  and  facsimile  transmission  have  triggered  much 
research  work  in  the  area.  Most  researches  are  aimed  at  the  development  of  coding 
schemes  for  efficient  transmission  of  image  data.  The  general  principle  of  coding 
schemes  is  to  exploit  the  statistical  redundancy  among  image  pixels  such  that  images 
can  be  represented  by  fewer  bits.  So  far,  many  coding  schemes  and  their  variations 
have  been  developed.  Basically,  these  coding  schemes  can  be  classified  either  as  lossy 
or  lossless  schemes  in  terms  of  their  ability  to  reconstruct  the  coded  images.  Using 
the  lossless  schemes,  original  images  can  be  perfectly  recovered  after  decoding. 
Usually,  the  attainable  compression  ratio  of  lossless  schemes  is  rather  low, 
somewhere  between  2:1  and  3:1  depending  on  image  statistics.  The  lossy  schemes, 
on  the  other  hand,  can  be  designed  to  achieve  any  desired  compression  ratio  at  the 
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expense  of  the  quality  of  the  reconstructed  image  after  decompression.  The  lossy 
coding  schemes  are  effective  because  our  human  visual  system  can  not  detect  a minor 
difference  between  the  original  image  and  its  reconstructed  version.  Most  researchers 
concentrate  their  studies  on  lossy  coding  schemes  because  they  can  provide  much 
higher  compression  than  lossless  schemes  without  seriously  degrading  the  image 
quality.  Since  this  dissertation  research  has  little  to  do  with  lossless  coding,  only  lossy 
image  coding  schemes  are  reviewed  here. 

In  this  literature  review,  lossy  image  coding  techniques  are  broadly  classified 
into  eight  categories:  (1)  predictive  coding,  (2)  block  truncation  coding,  (3)  vector 
quantization,  (4)  subband  coding,  (5)  neural  network  techniques,  (6)  human  visual 
system,  (7)  image  processing  and  pattern  recognition  methods,  and  (8)  transform 
coding. 

2.2.1  Predictive  Coding 

It  is  well  known  that  for  most  images  there  is  a strong  correlation  between 
neighboring  pixels.  Predictive  coding  exploits  this  correlation.  In  basic  predictive 
coding  system,  an  approximate  prediction  of  the  sample  to  be  encoded  is  made  from 
previously  coded  information  that  has  been  transmitted.  The  error  (or  differential 
signal)  resulting  from  the  subtraction  of  the  prediction  from  the  actual  value  of  the 
pixel  is  quantized  into  a set  of  L discrete  amplitude  levels.  These  levels  are  then 
represented  as  binary  words  of  fixed  or  variable  word-lengths  and  sent  to  the  channel 
coder  for  transmission. 

In  summary,  the  predictive  coder  has  three  basic  components:  (1)  predictor, 
(2)  quantizer,  and  (3)  code  assigner.  Depending  upon  the  number  of  levels  of  the 
quantizer,  a distinction  is  often  made  between  delta  modulation  (DM),  which  has  L 
= 2,  and  differential  pulse  code  modulation  (DPCM),  which  has  L greater  than  2. 
Since  only  two  levels  are  used  in  the  DM,  in  order  to  get  an  adequate  picture  quality, 
the  sampling  rate  has  to  be  several  times  the  Nyquist  rate.  Although  the  DM  has 
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been  used  extensively  in  encoding  other  waveforms  (e.g.,  speech),  it  has  not  been 
found  to  be  useful  in  the  encoding  of  pictures.  This  is  due  perhaps  to  the  high 
sampling  rates  required  (Netravali  and  Haskell  [1988]),  In  general,  it  is  not  as  good 
as  transform  coding  in  terms  of  compression  ratio  (Jain  [1981]). 

2.2.2  Block  Truncation  Coding 

The  standard  block  truncation  coding  (BTC)  was  introduced  by  Delp  and 
Mitchell  [1979].  In  this  technique,  the  image  is  first  divided  into  small  nonoverlapping 
block  (usually  4 x 4 or  4 pkels  by  4 pixels)  and  then  BTC  is  applied  to  each  block 
independently.  The  heart  of  the  standard  BTC  is  a 1 bit  nonparametric  quantizer 
whose  output  levels  are  obtained  by  matching  sample  moments  of  the  data  before 
and  after  quantization.  For  this  reason,  a BTC  coder  is  also  known  as  a moment 
preserving  quantizer. 

The  original  algorithm  preserves  the  block  mean  and  the  block  standard 
deviation  (Delp  and  Mitchell  [1979],  Healy  and  Mitchell  [1981]).  If  the  value  of  a 
pbcel  in  a block  is  less  than  the  mean  in  that  block,  the  output  of  the  quantizer  will 
be  a ’0’  in  the  corresponding  position;  otherwise,  it  will  be  a T’.  Thus,  a block  of  ’0’ 
and  T’  is  obtained,  and  this  is  known  as  a bit  plane  or  bit  map.  The  information  of 
the  bit  plane  along  with  the  mean  and  the  deviation  must  be  sent.  With  this 
information,  the  decoder  can  reconstruct  an  approximate  version  of  the  original  data. 

There  have  been  other  choices  of  the  moments  which  result  either  in  a 
reduced  mean  squared  error  or  computational  complexity  (Halverson  et  al.  [1984], 
Lema  and  Mitchell  [1984],  Udpikar  and  Raina  [1985],  Udpikar  and  Raina  [1987], 
Alsaka  and  Lee  [1990]).  Other  variations  of  the  standard  BTC  include  BTC  with 
variable  block  size  (Kamel  et  al.  [1991]),  adaptive  BTC  for  minimizing  the  mean 
squared  error  of  the  quantizer  (Hui  [1990]),  interpolative  BTC  to  encode  partial 
image  data  and  interpolate  the  rest  (Zeng  [1991]),  adaptive  compression  coding 
formed  by  combining  BTC  with  quad-tree  coding  for  regions  which  are  smooth  and 
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regions  which  contain  an  edge  (Nasiopoulos  et  al.  [1991]),  using  medial  filter  roots 
to  further  compress  the  bit  plane  (Zeng  et  al.  [1991]),  using  one  single  bit  map  for 
three  color  components  (Red,  Green,  and  Blue)  of  a color  image  (Wu  and  Coll 
[1992]). 

The  biggest  advantage  of  the  BTC  is  its  low  computational  complexity. 
However,  even  with  so  many  variations  of  the  BTC,  to  achieve  a compression  of  1 
bpp  for  monochrome  images  and  still  to  be  able  to  reconstruct  excellent  quality 
images  is  still  a challenge  for  the  BTC.  On  the  contrary,  this  goal  has  been 
accomplished  by  transform  coding. 

2.2.3  Vector  Quantization 

Vector  quantization  (VQ)  operates  on  a group  of  pixels  instead  of  a single 
pbcel.  It  is  an  extension  of  scalar  quantization  to  a higher  dimensional  space.  VQ  has 
the  ability  to  exploit  memory  or  correlation  between  neighboring  input  pixels.  In 
image  compression,  the  vector  is  usually  formed  from  a sub-block  of  the  image, 
typically  a2x2or4x4  square.  The  encoder  and  decoder  share  a collection  of 
codewords  or  possible  reproduction  vectors  known  as  codebook.  The  distortion  or 
error  is  computed  between  an  input  vector  and  each  codeword  in  the  codebook  to 
find  the  minimum  distortion  codeword.  The  address  or  index  of  the  best  matched 
codevector  is  transmitted  to  the  receiver.  At  the  receiver,  the  received  address  is  used 
to  fetch  a codevector  from  the  codebook.  This  codeword  is  then  used  to  reconstruct 
the  image  block. 

Most  research  in  the  VQ  has  focussed  on  developing  algorithms  which  address 
the  following  three  major  problem  areas:  (1)  techniques  for  designing  a good 
codebook  that  is  representative  of  all  the  possible  occurrences  of  pixel  combination 
in  a block  (Linde  et  al.  [1980],  Vaisey  and  Gersho  [1988],  Marangelli  [1991]);  (2) 
development  of  techniques  for  generating  an  adaptive  codebook  (for  more  optimal 
coding)  which  updates  itself  based  on  the  image  or  sequence  being  coded  (Sun  and 
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Goldberg  [1985],  Goldberg  and  Sun  [1986],  Gersho  and  Yano  [1985],  Riskin  and 
Gray  [1991],  Modestino  and  Kim  [1992]);  and  (3)  techniques  for  finding  a best  match 
in  the  codebook  in  the  shortest  period  of  time  (Nyeck  et  al.  [1992])  . 

The  real-world  coding  results  shows  that  the  VQ  can  achieve  little  distortion 
at  rates  as  low  as  0.5  bpp  (Modestino  and  Kim  [1992]).  However,  the  VQ  has  still 
not  been  made  feasible  for  real-time  video  transmission  due  to  the  huge  efforts 
required  in  establishing  and  searching  its  codebook. 

2.2.4  Subband  Coding 

The  basic  idea  of  the  subband  coding  (SBC)  is  to  divide  the  frequency  band 
of  the  signal  into  a number  of  subbands  by  a bank  of  digital  bandpass  filters.  Each 
subband  is  then  translated  to  baseband  by  down-sampling  and  encoded  separately. 
For  reconstruction,  the  subband  signals  are  decoded  and  up-sampled  back  to  the 
original  frequency  band  by  interpolation.  The  signals  are  then  summed  up  to  give  a 
close  replica  of  the  original  signal.  An  example  of  subband  codec  is  shown  in  Fig.  2.1. 
The  filters  in  the  encoder  are  known  as  analysis  filters  and  those  in  the  decoder  are 
called  synthesis  filters. 

Subband  coding  of  one-dimensional  speech  signals  was  introduced  by 
Crochiere  et  al.  [1976].  The  one-dimensional  subband  decomposition  has  been 
theoretically  extended  to  two-dimensional  subband  filtering  by  Vetterli  [1984]. 
However,  the  application  of  two-dimensional  subband  image  coding  was  first 
reported  by  Woods  and  O’Neil  [1986].  Two-dimensional  finite  impulse  response 
(FIR)  quadrature  mirror  filters  (QMF)  were  used  by  them  to  perform  band  partition. 
Furthermore,  they  used  16-band  filter  system  instead  of  4-band  system  shown  in  Fig. 
2.1.  That  is,  each  subband  in  Fig.  2.1  is  further  divided  into  4 subbands.  Each  of  the 
16  subbands  was  encoded  by  DPCM.  Bits  are  then  optimally  allocated  among  the 
subbands  to  minimize  the  mean-squared  error  for  DPCM  coding  of  the  subbands. 
They  showed  experimentally  that  the  performance  of  image  subband  coding  can  be 
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Fig.  2.1  A subband  codec. 
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very  competitive  with  major  image  coding  techniques  available  at  that  time  including 
the  DCT-based  transform  coding  and  vector  quantization. 

Woods  and  O’Neil’s  approach  soon  became  the  standard  for  subband  image 
coding.  Since  that  time,  many  variations  of  subband  coding  have  been  proposed  to 
improve  the  performance  of  the  standard  approach.  Finite  state  vector  quantization 
was  used  by  Kim  et  al.  [1988]  to  encode  the  subbands.  Westerink  et  al.  [1988] 
proposed  an  optimal  bit  allocation  algorithm  for  subband  coding  and  showed  a 
superior  rate-distortion  performance  than  that  by  the  standard  approach.  Gall  and 
Tabatabai  [1988]  proposed  a symmetric  short  kernel  filter  to  replace  the  QMF 
because  long  tap  QMF  are  required  for  near  perfect  reconstruction  which  increases 
the  implementational  complexity.  A psychophysically  justified  bit  allocation  algorithm 
for  subband  image  coding  was  proposed  by  Perkins  and  Lookabaugh  [1989].  They 
found  that  the  psychophysically  justified  algorithm  is  superior  to  the  minimum  mean- 
squared  error  algorithm  at  low  bit  rates.  Li  and  He  [1989]  proposed  a directional 
subband  image  coding  scheme  where  signals  are  split  into  frequency  bands  according 
to  directional  components  in  order  to  match  the  human  visual  sensitivity.  Smith  and 
Eddins  [1990]  showed  that  high  quality  analysis/synthesis  systems  can  be  constructed 
with  a significant  reduction  in  the  computational  complexity  when  infinite  impulse 
response  (HR)  filters  are  used.  Jeon  and  Kim  [1991]  proposed  a new  technique  for 
designing  linear  phase  QMF  pairs.  With  this  method,  QMF  pairs  of  short  kernel 
filters  can  be  designed  which  have  the  perfect  reconstruction  property  and  relatively 
sharp  and  symmetrical  transition  characteristics  in  frequency  responses.  While  Kim 
et  al.  [1988]  used  finite  state  vector  quantization  to  encode  the  subbands,  Kim  and 
Modestion  [1992]  employed  more  sophisticated,  yet  more  effective,  techniques  called 
entropy-constrained  vector  quantization  and  entropy-constrained  predictive  vector 
quantization. 
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In  summary,  two  major  research  issues  are  involved  in  subband  coding.  First, 
there  is  a need  to  design  efficient  digital  FIR  and/or  HR  filters  such  that  the  aliasing 
signals  are  eliminated  and  original  signals  can  be  reconstructed  perfectly.  Second, 
there  is  a need  to  encode  each  subband  efficiently  in  order  to  achieve  a very  low  bit 
rate  and  maintain  a good  image  quality.  In  general,  subband  coding  can  lead  to 
subjectively  improved  image  reconstructions  without  the  blocking  effects  which  often 
occur,  for  example,  in  the  case  of  two-dimensional  discrete  cosine  transform  coding 
or  memoryless  vector  quantization  of  images  (Perkins  and  Lookabaugh  [1989], 
Lookabaugh  and  Perkins  [1990],  Kim  and  Modestion  [1992]).  It  is  reported  in  Kim 
and  Modestion  [1992]  that  excellent  quality  image  reconstructions  at  average  rates 
as  low  as  0.3  bpp  for  typical  real-world  images  can  be  achieved.  Another  advantage 
is  that  the  errors  in  encoding  a subband  are  confined  to  that  particular  subband. 
Thus  the  quantization  noise  from  the  encoder  in  a subband  gets  reflected  back  into 
the  same  subband  and,  hence,  does  not  mask  the  (weaker)  signal  in  another  subband 
(Jayant  and  Noll  [1984],  Woods  and  O’Neil  [1986],  Nanda  and  Pearlman  [1992]). 

The  disadvantage  of  the  subband  coding  is  the  high  computational  complexity 
due  to  the  implementation  of  filtering  operation.  For  example,  by  using  the  most 
efficient  polyphase  structure,  8-tap  and  32-tap  FIR  QMF  require  4 multiplications 
and  4 additions  and  16  multiplications  and  16  additions  per  output  point,  respectively 
(Smith  and  Eddins  [1990]).  If  tree  structure  is  used,  16-tap  and  32-tap  FIR  QMF 
require  48  multiplications  and  48  additions  and  96  multiplications  and  96  additions 
per  output  point  (Husoy  [1991]).  In  comparison,  a fast  discrete  cosine  transform  takes 
only  1.5  multiplications  and  7.3  additions  per  output  point  for  a 8 x 8 data  block  (e.g. 
Cho  and  Lee  [1991]).  However,  the  HR  filters  may  be  comparable  to  fast  transforms 
since  these  filters  are  more  efficient  than  the  FIR  filters  in  achieving  the  same 
filtering  functions  (typically  by  a factor  of  5 to  30)  (Lim  [1990]).  However,  the  HR 
filters  are  more  difficult  to  design  than  the  FIR  filters.  Stability  is  never  an  issue  in 
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design  and  implementation  of  the  FIR  filter,  but  for  the  HR  filter,  testing  filter 
stability  and  stabilizing  an  unstable  filter  without  significantly  affecting  the  magnitude 
response  are  very  big  tasks  (Lim  [1990]). 

2.2.5  Neural  Network  Techniques 

Choosing  the  proper  codewords  is  the  biggest  difficulty  in  Vector 
Quantization,  and  usually  requires  adaptive  data  clustering.  This  can  be  done  by  a 
neural  network  clustering  technique  where  weights  (synaptic  strength)  between 
neurons  are  adaptively  sensitized  to  the  prevailing  input  patterns.  The  asymptotic 
values  of  the  weights  define  the  codebook  entries.  This  approach  was  studied  by 
Antonini  et  al.  [1990],  Kohno  et  al.  [1990],  Chang  et  al.  [1991],  Huang  and  Huang 
[1991],  and  Lu  and  Shin  [1992].  The  results  of  their  studies  do  not  show  any 
superiority  over  traditional  methods  for  training  vector  quantizers.  They  merely  show 
that  neural  network  technique  is  an  alternative  for  training  vector  quantizers. 

The  other  approach  is  the  indirect  way  of  compressing  images  with  transform 
method  using  the  optimization  technique  provided  by  the  neural  network.  The  basic 
ideas  can  be  described  as  follows:  An  image  can  be  treated  as  a two-dimensional 
matrix.  Transform  coding  is  nothing  but  multiplying  the  original  image  with  another 
matrix  to  get  the  matrix  whose  elements  will  have  less  correlation  than  those  in  the 
original  image.  Using  certain  cost  functions  and  several  training  pictures,  the  neural 
network  can  simulate  the  function  of  the  matrix  that  is  used  for  image 
transformation.  This  type  of  approach  can  be  viewed  as  a new  form  of  image 
transform  coding  based  on  a transform  which  is  neither  linear  nor  orthogonal. 
Unfortunately,  the  performance  of  this  approach  is  inferior  to  the  traditional  DCT 
approach  (Huang  et  al.  [1991]).  This  approach  has  been  investigated  by  Ersoy  and 
Chen  [1989],  Mougeot  et  al.  [1991],  Huang  et  al.  [1991]  with  different  neural 
networks  and  optimization  techniques. 
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Another  application  of  neural  network  in  image  coding  is  to  improve  the 
performance  of  Block  Truncation  Coding  (BTC)  (Qiu  et  al.  [1991]).  In  a traditional 
BTC,  a threshold  is  determined  by  the  statistical  property  of  the  picture  block  of 
interest  (for  example,  the  mean  value  of  the  gray  levels  in  the  block).  Given  the 
threshold,  the  gray  level  picture  block  is  converted  to  a binary  picture  block.  Again, 
using  the  optimization  power  of  the  neural  network,  an  optimal  threshold  is 
determined  in  terms  of  the  least  mean  squared  error  in  the  picture  block. 

2.2.6  Human  Visual  System  (HVSl 

Ultimately,  the  only  meaningful  test  of  an  image’s  fidelity  is  how  it  "looks"  - 
presumably  to  a human  observer.  Previous  approaches  have  utilized  error  norms  such 
as  mean  squared  error  (MSE)  which  does  not  necessarily  correlate  well  with 
perceptual  notions  of  visual  fidelity.  In  other  words,  standard  measures  of  signal 
information  do  not  necessarily  coincide  with  visually  significant  image  attributes. 
Modern  image  coding  methods  that  attempt  to  capitalize  on  recent  computational 
models  of  visual  information  processing,  rather  than  using  classical  information- 
theoretic  approaches,  belong  to  this  type  of  coding  scheme.  One  example  is  the 
combination  of  transform  coding  and  the  modulation  transfer  function  (MTF)  of  the 
HVS  model  (Chitprasert  and  Rao  [1990]).  The  coefficients  of  a transformed  image 
can  be  altered  by  MTF  in  frequency  domain  of  the  following  form 

/f=a(Z>+(^exp{(-c^"‘^  (2-1) 

where  f is  the  radial  frequency  in  cycles/degree  of  the  visual  angle  and  a,  b,  c,  d are 
constants  that  are  determined  by  experiments.  Another  example  is  called  visual 
pattern  image  coding  (VPIC)  (Chen  and  Bovic  [1990]).  VPIC  is  operated  by  coding 
the  images  to  be  transmitted  using  a small  set  of  visual  patterns  which  are  localized 
subimages  containing  visually  important  information. 
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2.2.7  Image  Processing  and  Pattern  Recognition  Methods 

This  type  of  methods  employs  some  techniques  in  image  understanding  such 
as  edge  detection,  segmentation,  contour  analysis,  texture  analysis,  etc.  The  basic  idea 
of  the  methods  is  to  divide  the  images  into  certain  segments  or  types  and  code  each 
portion  of  the  image  with  different  codes.  The  idea  was  first  described  in  Kunt  et  al. 
[1985].  They  called  the  coding  schemes  based  on  this  idea  the  second  generation 
image  coding.  Further  studies  in  this  direction  include  Kunt  et  al.  [1987],  Thyagarajan 
et  al.  [1987],  Alter-Gartenberg  and  Narayanswamy  [1990],  Radha  et  al.  [1990],  and 
Gilge  [1990]. 

In  the  first  stage  of  the  method,  the  pixels  of  image  are  segmented  into 
contour  pixels  and  texture  pixels.  This  procedure  partitions  the  image  into  a set  of 
adjacent  regions  under  the  constraint  that  the  variation  of  the  grey  level  within  the 
region  does  not  contain  any  sharp  discontinuities,  i.e.,  contours.  Segmentation  is 
carried  out  in  three  steps:  preprocessing,  region  growing,  and  elimination  of  artifacts. 
The  preprocessing  is  intended  to  reduce  the  local  granularity  of  the  original  image 
without  affecting  its  contours,  so  that  not  too  many  small  regions  are  obtained  after 
region  growing.  Furthermore,  these  small  regions  do  not  correspond,  in  general,  to 
real  objects  of  the  original  image  and  thus  become  false  contours.  For  region 
growing,  a region  must  be  characterized  with  some  property  first,  for  example,  the 
gray  level  of  a pixel,  the  variation  of  the  gray  level,  or  the  energy  within  a given 
frequency  band.  Then,  starting  with  a given  pixel  (called  seed)  in  the  picture,  its 
neighboring  pixels  are  examined  to  see  whether  they  share  the  same  property.  If  this 
is  the  case,  that  pixel  is  included  in  the  region  and,  in  turn,  its  neighboring  pixels  are 
examined,  and  so  on.  When  there  are  no  more  pixels  left  to  be  connected  to  the 
region,  the  procedure  stops  and  restarts  at  any  other  pixels  which  is  not  included  in 
the  first  region.  The  segmentation  is  complete  when  all  the  pixels  of  the  picture  are 
assigned  to  some  region.  After  region  growing,  some  undesirable  artifacts  are 
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obtained:  Contours  which  do  not  completely  separate  two  regions  and  contours  that 
are  two  pixels  wide. 

In  the  second  stage  of  the  method,  the  contour  coding  technique  is  carried  out 
in  two  steps.  Since  regions  are  close,  contour  points  along  the  border  of  two  adjacent 
regions  are  described  twice,  once  for  each  region.  In  the  first  step,  these  points  are 
removed  from  one  of  the  regions  to  be  described  and  coded  only  once.  In  the  second 
step,  remaining  contour  segments  are  described  in  a three-mode  procedure: 
(l)approximation  by  line  segments,  (2)  approximation  by  circle  segments,  and  (3)  no 
approximation.  Of  course,  other  contour  coding  methods  can  be  applied. 

In  the  final  stage  of  the  method,  the  textural  coding  techniques  are  also 
carried  out  in  two  steps.  Note  that  within  each  region  there  is  no  longer  any  sharp 
discontinuity  and  hence  the  variation  of  the  grey  level  within  a region  can  be 
described  with  a smooth  two-dimensional  polynomial  function.  In  the  first  step,  the 
general  shape  of  the  gray  level  in  each  region  is  approximated  by  a smooth  two- 
dimension  polynomial  function.  In  the  second  step,  the  granularity  removed  with 
preprocessing  is  added  back  in  the  form  of  a pseudo-random  noise  to  render  the 
image  more  natural  and  less  "painted  by  numbers."  The  mean  squared  error  between 
the  original  image  and  the  image  reconstructed  with  polynomial  function  is  computed 
in  each  region.  This  error  is  used  to  control  the  variance  of  zero-mean  Gaussian 
pseudo-random  signal  added  as  microtexture  or  "salt-and-pepper." 

The  compression  ratio  for  the  methods  can  be  as  high  as  40  while  maintaining 
a visually  intelligible  image.  Unfortunately,  they  are  very  computationally  expensive. 

2.2.8  Transform  Coding  Techniques 

Natural  image/video  scenes  contain  a lot  of  redundancy  in  the 
spatial/temporal  domain.  To  reduce  the  redundancy,  transform  coding  approach 
attempts  to  map  these  highly  correlated  image  samples  to  another  domain  in  which 
image  coefficients  become  statistically  independent.  For  the  one-dimensional  case. 
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the  image  is  divided  into  mutually  exclusive  blocks  and  typical  block  sizes  are  1x8 
and  1 X 16.  Larger  block  sizes  offer  better  compression  up  to  a point,  at  the  expense 
of  increased  complexity.  The  block  of  data  is  then  considered  as  a vector  and  is 
linearly  transformed  by  multiplying  it  by  a transform  matrix.  The  aim  is  to  transform 
an  original  vector,  which  has  energy  equally  distributed  among  the  samples,  to  a new 
space  where  most  of  the  energy  is  concentrated  in  only  a few  coefficients.  Data 
compression  can  then  be  achieved  by  coding  only  those  coefficients  that  have 
substantial  energy. 

An  optimal  transformation  should  satisfy  two  conditions:  (1)  all  transform 
coefficients  become  statistically  independent  and  (2)  the  energy  is  compacted  into  a 
minimum  number  of  transform  coefficients.  Under  the  condition  of  stationarity  and 
Gaussianess,  the  Karhunen-Loeve  transform  (KLT),  which  constructs  the  image  basis 
vector  set  corresponding  to  the  normalized  eigenvalues  of  the  covariance  matrix  of 
the  original  image,  has  been  proven  optimal  in  terms  of  the  two  conditions  indicated 
above.  Unfortunately,  the  KLT  is  difficult  to  compute  and  has  no  fast  algorithm  in 
general.  As  a consequence,  tremendous  efforts  have  been  made  to  approximate  the 
KLT  by  a family  of  fast  sinusoidal  unitary  transforms  such  as  discrete  Fourier 
transform  (DFT),  discrete  cosine  transform  (DCT),  discrete  sine  transform  (DST), 
discrete  Hartley  transform  (DHT)  (Bracewell  [1986]),  scrambled  real  discrete  Fourier 
transform  (SRDFT)  (Ersoy  and  Chen  [1988]),  and  DCT/DST  alternate  transforms 
(Rose  et  al.  [1990]).  The  DCT-based  transform  is  generally  considered  to  be  the  best 
approximation  to  the  KLT,  especially  for  the  first-order  Markov  signals.  In  fact,  it  is 
being  made  as  a coding  standard  for  still  picture  compression  (by  the  JPEG  or  Joint 
Photographies  Experts  Group)  and  is  widely  supported  by  major  vendors. 

Nonsinusoidal  orthogonal  unitary  transforms,  such  as  Walsh  Hadamard 
transform  (WHT)  (Prat  [1966]),  Haar  transform  (HT)  (Andrew  [1970]),  high- 
correlation  transform  (HCT)  (Clarke  [1985]),  M-transform,  and  slant  transform  (ST) 
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(Prat  [1974]),  are  also  used  for  image/video  compression  due  to  their  computational 
efficiency  over  the  DCT-based  fast  sinusoidal  family.  The  Reed-Muller  transform 
(RHT)  (Reddi  and  Pai  [1988]),  which  derives  from  the  Reed-Muller  canonical 
expansion  of  Boolean  functions  over  GF(2)  (Galio  Field),  and  the  modified  Hermite 
transform  (MHT)  (Haddad  and  Akansu  [1988]),  which  is  based  on  the  Hermite 
polynomial  expansions,  also  fall  into  this  category. 

In  two-dimensional  transform  codings,  an  image  is  first  divided  into  many  sub- 
blocks (usually  8 X 8 or  16  X 16).  These  blocks  are  then  transformed,  quantized, 
coded,  and  transmitted  independently.  The  correlation  within  each  block  is 
substantially  reduced  or  eliminated.  The  correlation  between  different  blocks, 
however,  remains  high.  The  problems  associated  with  this  are  twofold.  On  one  hand, 
remaining  high  correlation  affects  the  compression  performance.  On  the  other  hand, 
it  creates  some  artifacts  across  the  block  boundary,  especially  for  low  bit  rate  coding 
due  to  the  coarse  quantization  of  transformed  coefficients,  which  is  commonly  known 
as  "block  effect,"  "edge  effect,"  or  "tile  effect".  The  "block  effect"  seems  to  be  one  of 
the  major  disadvantages  of  transform  coding  and  is  also  the  main  limitation  of 
transform-based  low  bit-rate  coding.  One  simple  method  to  reduce  "block  effect"  is 
to  interleave  data  blocks  before  the  transform  is  performed.  However,  the  introduced 
overhead  due  to  the  block  overlapping  significantly  reduces  the  compression 
efficiency.  An  alternate  scheme  is  to  overlap  the  transform  basis  function  rather  than 
to  overlap  the  data  blocks,  such  as  the  lapped  orthogonal  transform  (LOT) 
(Cassereau  et  al.  [1989]).  A more  recent  approach  is  an  extension  of  the  LOT  called 
extended  lapped  transform  (Malvar  [1992]). 

In  Malvar  [1992],  the  author  pointed  out  that  transform  coding  and  subband 
coding  are  not  fundamentally  different.  We  can  view  a transform  operator  as  a 
critically  decimated  filter  bank  (Akansu  and  Haddad  [1992],  Vaidyanathan  [1993]) 
where  the  basis  functions  of  the  transform  are  the  impulse  responses  of  the  synthesis 
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filters  (Malvar  [1990]).  Similarly,  we  can  view  a filter  bank  as  a transform  with 
memory,  in  which  the  transform  of  a block  may  depend  not  only  on  that  block  but 
also  on  its  neighboring  blocks  (Vetterli  and  LeGall  [1989]). 

2.3  Progressive  Image  Transmission  (PITl 

Since  the  introduction  of  the  idea  of  progressive  transmission  by  Tanimoto  in 
1979  (Tanimoto  [1979]),  various  techniques  have  been  developed  to  provide  this 
feature.  Depending  on  where  the  progression  takes  place,  these  techniques  can  be 
divided  into  three  categories:  spatial  domain,  transform  domain,  and  pyramid- 
structured  techniques.  In  the  following  presentation,  we  begin  with  the  least  complex 
progressive  transmission  approach,  the  spatial  domain  techniques.  Next,  we  turn  to 
the  more  computation-intensive  approach,  the  transform  domain  techniques,  which 
generally  results  in  better  performance  at  the  expense  of  computation.  The  third 
approach,  the  pyramid-structured  techniques,  requires  moderate  to  high 
computational  effort  and  generates  fairly  good  results.  If  some  sections  of  the  image 
are  favored  to  be  developed  faster  than  others  in  the  PIT,  the  techniques  are  said  to 
be  nonhomogeneous.  Otherwise,  they  are  said  to  be  homogeneous. 

2.3.1  Spatial  Domain  Techniques 

In  spatial  domain  techniques,  progressive  transmission  is  achieved  by  directly 
refining  spatial  domain  data  over  several  iterations  of  transmission.  There  are 
basically  two  progression  fashions:  pbcel  value  refinement  and  spatial  resolution 
refinement.  In  the  first  fashion,  an  image  with  few  gray  scales  is  initially  transmitted, 
and  the  pixel  values  of  the  image  are  refined  gradually  by  allowing  more  gray  scales. 
In  the  second  fashion,  the  initial  image  uses  large  tiles,  each  containing  a block  of 
pixels,  and  the  tile  size  gradually  decreases  or  the  number  of  tile  patterns  gradually 
increases.  System  based  on  such  techniques  always  have  a simple  structure.  The 
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spatial  domain  techniques  can  be  further  divided  into  bit-plane  method,  tree-searched 
vector  quantization,  and  progressive  quantized  differential  pulse  code  modulation. 

2,3.3. 1 Bit-Plane  Method 

One  simple  way  to  realize  the  PIT  in  spatial  domain  is  to  organize  image  data 


bit  plane.  In  each  pass  of  transmission,  data  corresponding  to  one  bit  plane  are  sent. 
The  first  bit  plane  is  transmitted  first,  and  the  n-th  bit  plane  is  transmitted  last.  At 
the  receiver,  the  reconstruction  process  of  each  pkel  can  be  expressed  as  follows.  Let 
the  gray  level  of  a pixel  at  stage  i be  gj,  then 


where  aj  is  the  i-th  most  significant  bit  of  the  gray  level  value  of  a pixel.  Thus,  Uj  is 
either  0 or  1.  For  example,  if  n = 8,  then  the  sets  of  allowed  reconstruction  levels  are 
(0,  128)  for  1-bit  transmission,  (0,  64,  128,  192)  for  2-bit  transmission,  ...,  and  (0,  1, 
2,  ...,  255)  for  8-bit  transmission. 

A better  way  to  realize  the  PIT  in  terms  of  signal-to-noise  ratio  may  be 
achieved  by  using  quantization.  For  example,  a uniform  quantizer  over  the  gray-level 
range  0 to  255  can  be  applied  to  images  with  an  8-bit  integer  gray  scale.  For  the  1-bit 
quantizer,  two  reconstruction  levels  are  allowed  since  they  can  be  identified  by  a 1- 
bit  symbol.  The  best  scheme  for  this  case  is  to  represent  gray  levels  from  0 to  127  by 
their  mean,  63,  and  to  represent  gray  levels  from  128  to  255  by  191.  Therefore,  the 
sets  of  allowed  reconstruction  levels  are  (63,  191)  for  1-bit  transmission,  (31,  95,  159, 
223)  for  2-bit  transmission,..,,  and  (0,  1,  2,....,  255)  for  8-bit  transmission.  At  the 
receiver,  the  gray-level  value  of  a pixel  at  stage  i is 


where  aj  is  as  defined  previously  and  |_xj  is  the  largest  integer  that  is  smaller  or 
equal  to  x. 


gj=aj2'’'^+a22"  ^+  -+a,2" ' 


(2.2) 


5^  -0.5 


(2.3) 
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Systems  employing  the  bit-plane  method  require  minimum  complexity,  and  no 
distortion  is  introduced  during  processing.  However,  the  flexibility  and  performance 
of  their  approach  are  very  poor  since  a minimum  rate  of  1 bpp  is  required  to 
establish  the  initial  approximation  and  only  a binary  image  can  be  reconstructed  at 
that  rate.  Also,  a minimum  incremental  bit  rate  of  1 bpp  is  required,  and  high  quality 
pictures  are  expected  at  5 to  6 bpp. 

2.3. 1.2  Tree-Searched  Vector  Quantization 

The  major  feature  of  tree-searched  vector  quantization  (VQ)  is  that  the 
codewords  are  hierarchically  structured  into  a tree  so  that  a search  for  the  minimum- 
distance  codeword  can  be  done  by  successive  search  through  the  tree.  An  example 
of  the  vector  size  of  6 pixels  (3  x 2)  is  shown  in  Fig.  2.2.  At  each  node  the  encoder 
selects  the  minimum-distance  codeword,  sends  a l-bit  symbol  to  the  receiver  to 
identify  which  branch  it  follows,  and  advances  down  to  the  next  node.  To  support 
progressive  transmission,  the  image  is  initially  coded  using  the  two  codewords  in  the 
first  level  of  the  tree,  and  the  1-bit  labels  identifying  the  codewords  are  sent  to  the 
receiver  to  build  the  first  crude  picture.  The  coding  procedure  is  then  moved  down 
to  the  next  level,  and  the  corresponding  1-bit  symbols  are  sent  to  refine  the 
reconstruction.  The  procedure  continues  until  the  finest  reconstruction  is  reached. 

The  codebook  (a  complete  set  of  codewords)  shown  in  the  example  is  chosen 
arbitrarily  for  illustration  purpose.  In  fact,  codebook  design  is  a research  topic  by 
itself.  Three  factors  must  be  considered  in  codebook  design  for  VQ.  They  are  the 
size  of  a codebook,  the  complexity  to  generate  or  search  the  codebook,  and  the 
distortion  (subjective  or/and  objective)  associated  with  the  codebook  to  reconstruct 
an  image.  In  practical  application,  the  codebook  keeps  only  important  or 
representative  codewords  not  every  possible  codeword.  A general  impairment  in 
reconstructed  images  using  VQ  is  the  blocky  appearance  due  to  vector  formation  and 
inadequate  reproduction  vectors  (codewords).  As  with  other  VQ  systems,  tree- 
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Fig.  2.2  Examples  of  three-level  tree-searched  VQ  for  a vector  size  of  3 x 2 pixels. 
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searched  VQ  also  suffers  the  drawback  of  heavy  computational  load  in  codebook 
design.  However,  its  encoding  complexity  is  relatively  low,  when  compared  with  the 
exhaustively  searched  VQ.  An  important  feature  of  any  VQ  is  the  simplicity  of 
decoding,  which  can  be  implemented  as  a table  look-up  operation,  but  the  encoding 
processing  is  usually  very  complicated. 

2.3. 1.3  Progressively  Quantized  Differential  Pulse  Code  Modulation  (DPCMI 

Since  typical  image  data  always  show  nonuniform  distributions,  nonuniform 
quantizers  are  superior  in  achieving  low  distortion.  The  DPCM  techniques  are  often 
employed  to  improve  the  efficiency  of  quantizers  by  converting  the  widely  spread 
original  data  into  the  very  concentrated  prediction  errors.  In  a conventional 
approach,  a quantizer  with  predetermined  resolution  is  used  to  quantize  the 
prediction  errors.  To  render  progressive  transmission  capability,  the  prediction  errors 
can  be  progressively  quantized  with  improving  accuracy.  There  are  two  major 
approaches  to  providing  progressive  quantization:  multistage  quantization  and 
embedded  quantization. 

In  the  multistage  quantization  approach,  input  data  are  initially  quantized  by 
a coarse  quantizer  and  the  resultant  quantization  error  is  then  quantized  by  a finer 
quantizer  (Wang  and  Goldberg  [1988]).  The  new  quantization  error  can  be  quantized 
by  an  even  finer  quantizer,  and  this  refining  process  can  be  continued  until 
satisfactory  accuracy  is  reached.  Fig.  2.3  is  a block  diagram  of  multistage 
quantization.  The  design  of  nonuniform  quantizers  involves  knowledge  of  the 

t 

distribution  model  of  data  in  each  of  the  multiple  stages.  In  the  typical  DPCM 
systems  the  prediction  error  can  be  well  modeled  by  a Laplacian  density.  Thus,  the 
nonuniform  quantizer  matched  to  a Laplacian  density  can  be  employed  in  the  first 
stage,  where  the  prediction  error  is  quantized.  The  quantization  errors  in  subsequent 
stages  tend  to  be  more  spread  and  can  be  well  approximated  by  uniform  distribution. 
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Fig,  2,3  Diagram  of  multistage  quantization. 
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For  rendering  practical  progressive  transmissions,  a 1-bit  quantizer  can  be  used  in 
every  stage. 

The  second  approach  to  progressive  quantization,  embedded  quantization,  is 
easier  to  implement  and  more  efficient  in  achieving  low  distortion.  The  basic 
principle  behind  the  embedded  quantization  is  the  alignment  of  quantization 
thresholds  of  different  resolutions  so  that  the  quantizer  outputs  corresponding  to  a 
lower  resolution  quantizer  are  embedded  in  those  corresponding  to  a higher 
resolution  quantizer.  Therefore,  embedded  quantization  allows  input  data  initially 
quantized  at  a coarse  resolution  to  be  refined  thereafter  by  sending  an  additional  bit 
in  each  step.  An  example  of  embedded  quantization  is  shown  in  Fig.  2.4  for  1-bit  to 
3-bit  resolution.  Since  all  successive  approximations  can  be  implemented  as  table 
look-up  operations  in  the  embedded  quantization  approach,  it  its  more 
computationally  efficient  than  the  multistage  approach,  where  a subtraction  is  needed 
in  every  stage. 

2.3.2  Transform  Domain  Techniques 

The  principle  behind  the  2-D  transform  coding  is  that  in  the  transform  domain 
the  energy  of  signal  tends  to  cluster  in  low  frequency  regions.  Therefore,  a few 
transform  coefficients  in  low  frequency  regions  can  be  used  to  make  a reasonable 
reconstruction  of  the  original  image.  Adding  more  coefficients  can  refine  the 
reconstruction.  This  feature  makes  the  2-D  orthogonal  transforms  extremely 
attractive  for  progressive  transmission.  Among  various  discrete  orthogonal  transforms, 
the  discrete  cosine  transform  (DCT)  is  the  most  popularly  employed  technique  for 
its  high  compression  efficiency  and  fast-algorithm  implementation.  Thus,  all  systems 
presented  in  this  subsection  are  assumed  to  use  a 2-D  DCT.  In  general,  transform 
domain  techniques  for  the  PIT  can  be  divided  into  three  categories;  scanning  pattern 
technique,  multistage  quantization,  and  bit-slicing  technique. 


For  Input  Data  "X" 


1 -bit  Output  "1 " 


2-bit  Output  "1 0" 


0 


3-bit  Output  "101“ 


Fig.  2.4  Example  of  embedded  quantizers  for  1-3  bit  quantization. 


Fig.  2.5  Zig-zag  scanning  pattern  for  the  block  size  of  8 x 8. 
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2.3.2. 1 Scanning  Pattern  Techniques 

A scanning  pattern  approach  refers  to  any  technique  that  transmits  transform 
coefficients  in  the  order  following  a certain  scanning  pattern  through  the  2-D 
transform  coefficients.  Initially,  a small  number  of  coefficients,  determined  by  the 
scanning  pattern,  from  each  block  are  transmitted  to  the  receiver  to  generate  the  first 
approximation  of  the  image.  In  each  pass  of  subsequent  transmission,  a group  of 
coefficients,  also  determined  by  the  scanning  pattern,  from  each  block  are 
transmitted  to  the  receiver  to  refine  an  existing  reconstruction.  The  optimal  scanning 
pattern  should  be  the  one  that  sends  coefficients  in  the  descending  order  of  their 
variance  values  since  this  will  minimize  the  mean-squared  distortion  due  to 
untransmitted  transform  coefficients.  The  zigzag  pattern  is  a commonly  used  scanning 
pattern  in  threshold  transform  coding  to  improve  coding  efficiency.  This  pattern 
usually  generates  more  consecutive  zeros  for  coefficients  under  the  threshold,  and 
efficient  run-length  codes  can  be  employed.  An  example  of  the  zigzag  pattern  for  8 
X 8 DCT  coefficients  is  shown  in  Fig.  2.5.  Basically,  the  zigzag  pattern  intends  to  scan 
the  transform  domain  from  the  large  to  the  small  variance  values.  This  simple 
scanning  scheme  does  result  in  robust  and  near-optimal  performance  since  the  zigzag 
pattern  almost  matches  the  order  of  coefficient  variance  values.  An  article  by  Ngan 
[1984]  compared  progressive  schemes  using  different  scanning  patterns.  It  confirmed 
the  robustness  and  the  near-optimal  performance  of  the  zigzag  scanning  scheme. 

23.2.2  Transform  Domain  Multistage  Quantization 

The  multistage  quantization  technique  applied  to  spatial  domain  signals  can 
be  applied  to  transform  domain  signals  as  well.  In  the  transform  domain,  the 
variances  of  transform  coefficients  tend  to  be  much  larger  in  the  low  frequency 
region  than  in  the  high  frequency  region.  For  an  efficient  use  of  transmitted 
information,  a zonal  coding  scheme  that  allocates  more  bits  to  the  low  frequency 
coefficients  than  to  the  high  frequency  coefficients  can  be  applied.  From  the  second 
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stage  and  after,  the  distributions  of  quantization  errors  become  more  uniform  across 
the  whole  frequency  region.  Thus,  uniform  quantizers  are  adequate  in  subsequent 
stages.  The  variances  of  quantization  errors  in  subsequent  stages  do  not  vary 
dramatically  as  the  variances  of  transform  coefficients  in  the  first  stage.  However,  the 
bit  allocation  scheme  based  upon  rate-distortion  theory  is  still  advantageous  over  the 
uniform  allocation  scheme.  The  range  of  usable  bit  rates  for  this  approach  is  much 
wider  than  that  of  its  counterpart  in  spatial  domain.  The  initial  bit  rate  can  be  well 
below  1 bpp,  and  recognizable  pictures  can  be  achieved  even  at  less  than  0.1  bpp. 
During  subsequent  stages  of  quantization,  the  system  also  allows  a portion  of 
quantization  errors  to  be  refined  by  further  quantization,  which  implies  that  the 
incremental  bit  rate  can  be  less  than  1 bpp  too.  This  system  has  been  shown  to 
deliver  good  quality  pictures  at  less  than  2 bpp.  A typical  example  of  the  transform 
domain  multistage  quantization  can  be  seen  in  Wang  and  Goldberg  [1986]  and  Wang 
and  Goldberg  [1988]. 

2.3.2.3  Bit-Slicing  Method 

While  the  above  system  employs  multistage  quantization  to  quantize  the 
transform  coefficients,  the  bit-slicing  method  employs  embedded  quantization.  Since 
the  coefficients  are  quantized  by  embedded  quantizers,  the  coded  coefficients  can  be 
transmitted  at  any  intermediate  resolution  of  integer  bits  and  can  be  refined 
progressively  by  transmitting  additional  bits.  One  remaining  problem  is  to  determine 
the  order  of  transmission.  A technique  called  bit-sliced  bit  allocation  was  introduced 
(Tzou  and  Elnahs  [1986])  to  achieve  minimum  distortion  for  every  stage  of 
transmission.  An  example  of  bit  maps  and  the  corresponding  incremental  bit  maps 
is  shown  in  Fig.  2.6,  where  an  incremental  bit  rate  of  1 bpp  and  a block  size  of  4 x 
4 are  assumed.  Combined  with  efficient  embedded  quantization  and  efficient  bit 
allocation,  this  system  performs  far  better  than  the  zigzag  scanning  pattern  approach. 
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Fig.  2.6  Bit  maps  (left)  and  incremental  bit  maps  (right)  for  a block  size  of 
4x4  and  an  incremental  rate  of  1 bpp. 


35 


When  compared  with  the  transform  domain  multistage  quantization,  this  approach 
appears  to  achieve  superior  performance  due  to  its  effective  quantizer. 

2.3.3  Pyramid-Structured  Progressive  Transmission 

Pyramids  have  been  found  to  be  a useful  tool  in  many  image  processing 
applications.  The  nature  of  pyramid  structure  is  ideal  for  progressive  transmission. 
To  form  pyramids,  an  image  is  successively  reduced  in  size  through  certain  processing 
such  as  pixel  averaging  and  filtering.  The  reduced  image  has  far  fewer  samples; 
however,  in  general,  it  still  provides  enough  intelligence  about  its  content.  While  the 
image  is  successively  reduced  to  form  a pyramid,  the  difference  between  two  layers 
of  the  pyramid  is  also  calculated.  The  reduced  image  on  the  top  of  the  pyramid  (the 
smallest  size)  can  be  used  for  initial  transmission,  and  it  can  be  expanded 
progressively  by  adding  differential  information  between  two  consecutive  layers  of  the 
pyramid.  The  techniques  to  form  pyramids  generally  fall  into  two  major  categories: 
the  tree-structured  pyramid  and  the  successively  filtered  pyramid. 

2.3.3. 1 Tree-Structured  Pyramid 

In  the  tree-structured  pyramid,  the  reduced  picture  in  a higher  layer  of  the 
pyramid  is  formed  by  constructing  each  new  pixel,  called  a father,  from  a number  of 
pixels,  called  sons,  of  the  picture  in  a current  layer  of  the  pyramid.  If  each  father 
node  is  connected  to  two  son  nodes,  it  is  called  a binary-tree  pyramid.  If  each  farther 
node  is  connected  to  four  son  nodes,  usually  in  a 2 x 2 configuration,  it  is  called  a 
quadtree  pyramid.  Let  yk(i,j)  be  a pixel  at  the  (i,j)-th  position  in  the  k-th  layer.  A 
quadtree  pyramid  can  be  specified  by  the  relationship 

(2-4) 

where  T[.]  is  a reduction  rule.  When  pixel  averaging  is  applied,  the  reduction  rule 
becomes 
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T\a,b,c,d\={a*b*c^d)IA  (2.5) 

When  a 2-D  subsampling  is  applied,  the  reduction  rule  is  simply 

'I\a,b,c,d\=x',  x=a,  b,  c,  or  d (2-6) 

When  an  image  is  represented  in  a pyramid  form,  information  required  to 
reconstruct  the  image  is  the  reduced  image  in  the  highest  layer  of  the  pyramid  and 
the  differential  information  between  two  consecutive  layers  of  the  pyramid.  The 
above  information  can  be  sent  directly  without  quantization  for  lossless  reproduction 

r 

or  using  quantization  to  further  accomplish  data  compression. 

The  above  discussion  is  the  most  primitive  pyramid  coding.  More  complicated 
and  effective  pyramid  coding  can  be  seen  in  Tanimoto  [1979],  Sloan  and  Tanimoto 
[1979],  Knowlton  [1980],  Hill  et  al.  [1983],  Sanz  et  al.  [1984],  Dreizen  [1987], 
Mongatti  and  Alparone  [1989],  Wang  and  Goldberg  [1989a,b],  and  Goldberg  and 
Wang  [1991]. 

2332  Successively  Filtered  Pyramids 

Burt  and  Adelson  [1983]  introduced  a more  general  technique  to  form  a 

pyramid.  To  generate  a higher  level  image,  filtering  is  applied  to  the  current  image 

and  a decimation  is  followed.  The  filter  is  generally  chosen  to  be  low-passed.  The  set 
¥ 

of  successively  low-pass  filtered  images  is  called  the  Gaussian  pyramid  since  the  net 
effect  of  successive  low-pass  filtering  of  certain  specific  types  converges  to  that  of  a 
Gaussian  shape  filter.  The  difference-image  between  two  consecutive  layers  is 
generated  by  expanding  the  higher  level  image  and  subtracting  it  from  the  lower  level 
image.  The  set  of  difference-images  is  called  the  Laplacian  pyramid  since  the 
operation  of  subtracting  a low-passed  image  from  itself  imitates  the  Laplacian 
operation.  Their  formation  of  pyramids  is  self-explanatory  with  Fig.  2.7. 

The  coding  technique  proposed  by  Burt  and  Adelson  [1983]  is  relatively 
straightforward.  They  considered  the  highest  level  Gaussian  image  and  all  the 
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Fig.  2.7  Burt  and  Adelson’s  pyramid  formation. 
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Laplacian  images  to  have  very  little  correlation,  and  they  applied  quantization  to  all 
these  images.  The  number  of  quantization  levels  for  each  image  is  experimentally 
determined  such  that  little  visual  degradation  is  noticeable.  Based  upon  their 
simulation,  high  quality  images  can  be  achieved  at  a bit  rate  of  less  than  2 bpp.  The 
system  complexity  is  comparable  to  that  of  transform  coding  systems.  Similar 
approaches  can  be  found  in  Arbeiter  and  Bessler  [1986]  and  Hofmann  and  Troxel 
[1986]. 

While  Burt  and  Adelson’s  system  employs  low-pass  filters  of  short  length,  a 
more  compression-efficient  system  can  be  achieved  using  the  2-D  quadrature  mirror 
filter  (QMF).  One  example  of  this  approach  can  be  found  in  Tran  et  al.  [1987]. 
During  pyramid  formation,  the  decimation  process  may  introduce  signal  aliasing,  and 
a low-pass  filter  should  be  applied  to  avoid  possible  aliasing.  On  the  other  hand, 
another  low-pass  filter  is  required  in  the  interpolation  process  to  remove  the 
unwanted  signal  in  the  high  frequency  band.  The  Laplacian  image  is  the  difference- 
image  between  two  consecutive  Gaussian  images  in  the  pyramid.  Since  the  Gaussian 
image  in  a higher  level  is  a decimated  low-pass  version  of  the  low  level  Gaussian 
image,  it  has  to  be  interpolated  to  the  same  size  as  the  lower  level  image  before  a 
pixel-by-pixel  subtraction  can  be  applied.  The  QMF  that  separates  input  signals  into 
high-band  signals  and  low-band  signals  appears  to  be  the  most  natural  process  to 
form  the  Gaussian  pyramid  and  the  Laplacian  pyramid  simultaneously.  When  a 2-D 
QMF  is  applied,  an  output  image  is  divided  into  four  regions  corresponding  to  high- 
band  and  low-band  signals  in  both  horizontal  and  vertical  directions.  Fig.  2.8  shows 
how  a Gaussian  pyramid  and  three  Laplacian  pyramids  are  formed  by  the  2-D  QMF. 

The  2-D  QMF  is  also  employed  to  form  subband  images  for  image  coding.  In 
the  subband  coding  approaches,  both  nondifferential  and  differential  codings  were 
employed.  In  the  nondifferential  coding  approach,  the  Laplacian  images  (signals  in 
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Fig.  2.8  Pyramid  formation  based  on  2-D  quadrature  mirror  filters. 
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subbands  except  for  the  lowest  one)  are  believed  to  have  little  spatial  correlation, 
and  linear  quantization  with  thresholds  in  the  center  to  suppress  small  variations  is 
applied  to  all  Laplacian  images.  The  highest  level  Gaussian  image  is  DPCM  coded. 
Since  the  quantized  Laplacian  images  contain  a lot  of  consecutive  zeros,  run-length 
codes  are  incorporated  to  improve  coding  efficiency.  In  contrast  to  the  nondifferential 
coding  approach,  a certain  amount  of  spatial  correlation  is  believed  to  remain  in  the 
Laplacian  images,  and  the  2-D  DPCM  coding  is  applied  in  the  differential  coding 
approach.  The  total  numbers  of  computations  per  sample  for  the  above  QMF  filters 
are  several  times  higher  than  that  for  a 16  x 16  fast  DCT. 

2.3.4  Nonhomogeneous  Image  Built-up  Techniques 

By  nonhomogeneous  we  mean  some  sections  of  the  image  are  favored  to  be 
developed  faster  than  others.  The  reason  of  doing  this  rather  than  using  traditional 
homogeneous  one  is  based  on  a simple  observation  that  the  subjective  information 
content  of  many  images  varies  by  area.  For  example,  an  image  of  a person’s  face 
centered  against  an  even  tone  background  contains  little  useful  information  in  the 
background  because  most  of  the  area  of  interest  lies  in  the  middle.  Consider  another 
example  where  mbced  graphics  and  text  are  in  one  image.  The  information  from  the 
texts  may  be  the  key  point  for  complete  and  early  recognition  of  the  image.  In  this 
case,  if  the  text  part  of  the  image  can  have  a higher  priority  than  the  non-text  part 
in  terms  of  picture  progression  speed,  the  capability  of  the  PIT  is  improved.  The 
term  nonhomogeneous  PIT  was  proposed  by  Dreizen  [1987].  Other  names  such  as 
interactive  detailing  (Sloan  and  Tanimoto  [1979]),  nonuniform  PIT  (Knowlton  [1980], 
Sanz  et  al.  [1984])  and  selective  fill  in  (Hill  et  al.  [1983])  are  also  used  in  the 
literature. 

The  idea  of  interactive  detailing  is  introduced  by  Sloan  and  Tanimoto  [1979]. 
The  interactive  detailing  allows  user  to  indicate  an  area  to  be  refined  further  but  stop 
refining  the  images  outside  the  assigned  area.  This  will  prevent  the  transmission  of 
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information  about  areas  of  the  image  which  are  uninteresting  to  the  user,  and  will 
allow  much  faster  refinement  of  the  important  details.  The  authors  mentioned  that 
some  methods  for  the  PIT  can  be  modified  for  this  special  function.  However,  how 
to  achieve  this  is  not  described  in  the  paper.  Finally,  the  authors  briefly  mentioned 
the  possibility  of  using  transform  domain  methods  (Fourier  and  Hadamard)  for  the 
PIT.  They  also  pointed  out  that  the  difficulty  of  the  transform  methods  comes  from 
their  computational  complexity. 

With  little  implementation  detail  and  no  theoretical  analysis,  Knowlton  [1980] 
suggests  a scheme  for  nonuniform  picture  development.  In  this  scheme,  the 
nonuniform  progression  was  produced  by  developing  nested  annular  squares  with 
progressively  larger  cells  in  the  more  peripheral  annuli.  He  tested  the  scheme  on  an 
image  where  a man’s  face  is  in  the  vicinity  of  the  center  of  the  image  and  he  found 
an  apparent  advantage  over  the  uniform  picture  development  because  the  face  can 
be  recognized  at  the  earlier  stages. 

In  Hill  et  al.  [1983],  the  authors  discussed  user  interaction  from  computer 
graphics’s  point  of  view  including  the  concept  of  selective  fill-in  and  fill-in  with  zoom. 
One  important  user  interaction  function  they  proposed  is  to  select  the  resolution  of 
display  and  the  display  area  of  interest.  The  selected  area  can  further  be  zoomed  in 
so  that  the  user  can  identify  the  picture  more  easily.  Basically,  this  approach  is  based 
on  what  we  call  a nonhomogeneous  pyramidal  structure.  In  other  words,  PIT  is 
performed  through  a pyramidal  structure  but  only  some  selected  part  of  the  pyramid 
at  certain  level  is  refined  to  the  desired  level  while  the  remaining  parts  of  the 
pyramid  stop  refining.  This  approach  has  the  following  drawbacks.  First,  the  block 
effect  is  so  obvious  at  the  early  stages  of  PIT  that  a reasonable  quality  of  the  picture 
can  only  be  achieved  when  more  than  half  of  the  levels  of  pyramid  are  received. 
Second,  the  boundary  line  between  selective  fill-in  and  non-selective  fill-in  creates 
an  artificial  boundary  to  the  original  image  which  will  make  the  recognition  of  the 


42 


whole  image  more  difficult  unless  the  objects  of  interest  are  completely  encircled  by 
the  selective  fill-in  boundary.  The  first  drawback  can  be  overcome  by  using  transform 
domain  technique  rather  than  spatial  domain  technique.  The  second  drawback  is 
caused  by  a discontinuous  jump  of  resolution  between  fill-in  area  and  non-fill-in  area. 
This  problem  can  be  relieved  if  we  can  make  a smoother  transition  of  resolution 
between  those  two  areas.  For  example,  we  can  make  the  refining  speed  in  the  non- 
fill-in area  decrease  exponentially  from  that  at  the  fill-in  area.  Functions  other  than 
the  exponential  are  also  possible. 

Sanz  et  al.  [1984]  criticized  previous  methods  as  not  having  good  quality  of  the 
intermediate  approximations  which  leads  to  only  a small  effective  compression  ratio. 
Three  different  and  compatible  techniques  are  presented  to  improve  this  quality:  use 
of  smoother  interpolation  schemes  in  its  generation,  a nonuniform  decomposition 
producer  to  favor  the  early  refinement  of  relevant  areas  of  the  image,  and  an 
extension  of  the  transmission  hierarchy  to  pixel  quantization  producing  successive 
refinement  of  the  gray  level  resolution  of  the  approximations. 

The  reason  to  use  smoother  interpolation  scheme  is  to  reduce  the  strong  brick 
effect  in  the  approximations  due  to  the  "big  pixels"  created  with  the  information 
available  at  every  stage.  The  nonuniform  decomposition  is  used  to  achieve  the  goal 
of  early  recognition  of  the  picture.  At  a certain  stage  of  the  transmission,  different 
layers  of  the  pyramid  coexist  in  the  different  sections  (favored  and  unfavored 
sections).  As  every  pyramid  layer  represents  a certain  outgrowth,  this  scheme  means 
that  the  information  saved  by  the  slow  development  of  the  unfavored  areas  is  spent 
in  refining  the  privileged  ones  by  increasing  the  depth  of  its  pyramid  in  advance. 
Finally,  different  privilege  areas  are  quantized  with  different  number  of  bits.  The 
area  with  higher  privilege  will  be  assigned  a higher  number  of  bits  for  quantization. 
This  quantization  scheme  can  further  enhances  the  compression  capability  of  the 
overall  PIT  coding.  Combining  the  use  of  smoother  interpolating  schemes. 
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nonuniform  transmission  strategies,  and  privilege  quantization  greatly  improve 
intermediate  approximations  in  progressively  transmitted  images,  while  minimizing 
the  amount  of  information  needed  to  reach  a certain  objective  and  subjective  quality 
level  at  the  receiver. 

It  is  well  known  that  nonhomogeneous  PIT  decreases  the  time  required  for 
the  viewer  to  recognize  image  content.  However,  in  general,  the  transmission  must 
include  positional  information,  in  addition  to  the  image  content.  The  usefulness  of 
a nonhomogeneous  PIT  is  inversely  proportional  to  the  overhead  incurred  by  the 
transmission  of  positional  information.  Dreizen  [1987]  proposed  a method  whereby 
the  positional  coding  and  the  image  content  coding  methods  are  integrated.  In  this 
method,  the  nonhomogeneous  PIT  is  achieved  by  using  a breadth-first  search  of  an 
unbalanced  quadtree.  The  branching  criterion  of  a node  in  the  quadtree  is 
determined  by  a simple  information  measure  for  a subimage  represented  by  that 
node.  If  the  difference  between  the  highest  valued  pbcel  and  the  lowest  valued  pixel 
in  a subimage  is  not  less  than  a certain  threshold,  this  subimage  is  decomposed  into 
four  subimages.  In  this  way,  some  subimages  will  be  refined  faster  than  others.  The 
method  is  lossless  because  the  threshold  decreases  from  some  predefined  numbers 
to  1 in  the  final  stage  of  the  PIT  where  a full  balanced  quadtree  is  formed.  The 
method  can  be  modified  into  a lossy  method  to  improve  compression  by  simply 
choosing  the  final  threshold  to  be  greater  than  1. 

2.4  Fast  Progressive  Image  Transmission 

In  general,  a fast  inverse  transform  can  not  be  started  without  receiving  all  the 
coefficients.  We  call  this  waiting  period  the  delay  time.  In  order  to  reduce  the  delay 
time,  Takikawa  [1983]  first  developed  a fast  progressive  reconstruction  algorithm  for 
discrete  Fourier  transformed  (DFT)  and  Walsh-Hadamard  transformed  (WHT) 
images.  He  also  pointed  out  that  the  algorithm  may  be  applied  to  the  DCT  because 
it  can  be  evaluated  through  the  fast  Fourier  Transform  (FFT).  Later,  Miran  and  Rao 
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followed  the  derivation  of  Takikawa  and  developed  a fast  progressive  reconstruction 
algorithm  for  the  DCT  images  (without  using  the  FFT  as  suggested  by  Takikawa) 
(Miran  and  Rao  [1990]). 

The  basic  idea  of  their  approach  is  to  decompose  the  entire  transformed 
images  into  log2N  + 1 sparse  matrices,  each  of  which  can  be  inverse  transformed  by 
lx  1,  2 X 2,  4 X 4,...,  and  N x N fast  inverse  transform  algorithms,  where  N is  the 
image  block  size  and  the  block  is  squared.  One  major  drawback  of  their  approach 
is  that  the  order  in  which  the  sparse  matrices  are  formed  and  sent  is  not  based  on 
the  order  of  visual  significance.  In  general,  a DCT  coefficient  with  higher  variance 
(or  energy)  tends  to  be  more  visually  significant  than  that  with  lower  variance.  This 
is  also  true  for  the  WHT  and  the  DFT  coefficients.  It  is  well  known  that  the  DCT 
coefficient  variances  are  highly  correlated  along  the  zig-zag  scan  (Tescher  and  Cox 
[1976]).  In  fact,  it  has  been  shown  that  the  variances  of  the  zig-zag  scanned  DCT 
coefficients  can  be  approximated  by  a linear  relationship  in  the  logarithmic  domain 
(Ngan  [1982]).  Takikawa’s  and  Miran  and  Rao’s  attempts  all  have  fixed  transmission 
patterns  that  do  not  even  close  to  the  zig-zag  scan.  Therefore,  poor  performances  are 
expected,  and  have  been  confirmed  experimentally  by  Miran  and  Rao  [1990].  They 
tested  the  algorithms  developed  by  Takikawa  for  DFT  and  WHT  and  their  own 
algorithm  developed  for  the  DCT.  A common  result  for  all  three  algorithms  was  that 
poor  and  somewhat  discontinuous  image  build  up  was  observed.  They  ascribed  the 
drawback  to  not  having  low  frequency  terms  immediately  adjacent  to  the  DC 
components  in  the  intermediate  stages  of  reconstruction.  In  addition,  they  conclude 
that  their  DCT-based  and  DFT-based  algorithms  are  inferior  to  the  WHT-based 
algorithm  by  a noticeable  margin  because  the  latter  has  a better  sequence  of  image 
build-up. 

Another  drawback  of  their  algorithms,  which  was  not  mentioned  by  them,  is 
that  they  require  all  elements  of  the  sparse  matrices  to  start  the  computation  of  the 
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inverse  transform.  Thus,  the  delay  time  is  reduced  but  not  eliminated.  For  the  first 
few  stages  of  the  PIT,  the  delay  time  is  not  serious  as  the  sizes  of  matrices  are  small. 
It  however  becomes  significant  in  later  stages. 

2.5  The  Approach  Taken  by  This  Dissertation 
In  this  dissertation,  a new  approach  to  fast  progressive  reconstructions  of 
transformed  images  is  proposed.  This  approach  is  fundamentally  different  from  those 
taken  in  recent  studies  by  Takikawa  [1983]  and  Miran  and  Rao  [1990]. 

The  proposed  approach  takes  advantage  of  the  feature  that  for  the 
transmission  of  most  images  the  input  matrices  to  an  inverse  transform  is  sparse.  This 
is  a very  import  feature  for  the  PIT  with  transform  domain  techniques.  Traditional 
fast  inverse  transforms  generally  do  not  consider  the  characteristics  of  the  input 
matrices  such  as  sparsity.  In  the  proposed  approach,  the  sparsity  is  exploited  by 
processing  the  nonzero  elements  in  the  matrix  one  at  a time.  The  contribution  due 
to  each  nonzero  element  is  then  added  to  get  the  final  result.  For  convenience,  the 
final  result  is  called  the  goal  matrix  and  the  contribution  due  to  each  nonzero 
element  is  called  the  partial  contribution  to  the  goal  matrix. 

To  calculate  the  partial  contribution,  a novel  technique  is  used  to  reduce  the 
required  computational  complexity.  This  technique  requires  that  the  basis  functions 
of  the  transform  under  consideration  have  some  types  of  symmetry  (eg.  even 
symmetry)  with  respect  to  themselves.  With  this  symmetry,  we  can  show  that  the 
partial  contribution  also  exhibits  the  similar  symmetry.  Thus,  only  part  of  the  partial 
contribution  needs  to  be  computed  through  regular  arithmetic  operations  of 
multiplications  and  additions.  The  other  parts  of  the  partial  contribution  can  be 
determined  by  a simple  duplication  process  with  essentially  no  multiplication  or 
addition.  Again,  the  assumption  of  symmetry  for  the  transform  basis  functions  is  by 
no  mean  strict.  In  fact,  this  assumption  is  valid  for  many  practical  and  commonly 
used  transforms. 
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Since  each  nonzero  element  in  the  input  matrix  is  processed  one  at  a time,  an 
immediate  advantage  of  the  proposed  approach  over  other  approaches  is  that  any 
scanning  sequences  of  the  transform  coefficients,  including  the  one  that  follows 
closely  the  ranking  of  coefficient  variances,  are  allowed.  Thus,  the  images  can  be 
reconstructed  progressively  with  any  degree  of  smoothness. 


CHAPTER  III 

A GENERAL  THEORY  FOR  PROGRESSIVE  TRANSFORM  CODING 

3.1  Introduction 

The  block  diagram  of  a typical  transform-based  coder  is  shown  in  Fig.  1.1.  To 
prepare  for  inputs  to  the  encoder,  source  data  are  grouped  into  nonoverlapping 
blocks,  each  of  which  has  MN  sample  data  in  length.  These  data  blocks  are  inputed 
to  the  unitary  transform.  At  the  output  from  the  decoder,  the  inverse  unitary 
transform  outputs  the  MN  sample  blocks.  In  the  encoding  process,  the  transformed 
coefficients  are  quantized,  encoded,  and  transmitted  through  a channel;  while  in  the 
decoder  process,  an  inverse  operation  is  performed  to  reconstruct  the  MN  sample 
blocks.  If  the  quantized  transformed  coefficients  in  a block  are  sent  and 
reconstructed  in  one  single  pass,  the  coder  is  said  to  be  in  sequential  mode.  On  the 
other  hand,  if  they  are  sent  and  reconstructed  in  multiple  passes,  the  coder  is  said 
to  be  in  progressive  mode. 

The  purpose  of  this  chapter  is  to  develop  a general  theory  for  a coder 
operating  in  progressive  mode.  The  theory  is  general  in  a sense  that  it  can  be  applied 
to  any  unitary  transform.  In  addition,  there  is  essentially  no  constraint  on  the  types 
of  data  to  be  processed  except  that  they  are  real.  The  typical  data  that  satisfy  this 
constraint  are  one  dimensional  speech  data,  two  dimensional  image  data,  and  three 
dimensional  video  data. 
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3.2  Basic  Principle  of  Signal  Expansions  in  Orthogonal  Functions 
In  this  section,  a brief  review  of  signal  expansions  using  orthogonal  functions 
is  given.  Two  complex  sequences,  {g(n)}  and  {f(n)},  which  are  defined  over  some 
common  interval  [0,  N-1],  are  orthogonal  if  their  inner  product  vanishes.  That  is, 

N-l 

E 8(n)f*(n)=^fin)g  *(n)  =0  (3.1) 

n=0  n=0 

where  f (n)  and  g (n)  are  the  conjugate  of  f(n)  and  g(n),  respectively.  Now,  suppose 
we  can  find  a family  of  N linearly  independent  sequences  (x/n),  Q<  r<  N-l}  defined 
over  the  interval  [0,  N-l].  This  family  is  orthogonal  if 


^ x/n)x,\n) 

n=0 

where  c^  is  the  norm  of  {x^(n)}  defined  as 


0, 


r=s 

otherwise 


(3.2) 


W-l  ]l/2 

E \Xr(n)\‘ 

/i=0 

The  orthonormal  family  is  obtained  by  the  following  normalization: 


norm[xJin)]  = 


(3.3) 


4>r(«)  = — J^r(«)  O^r^lV-1 

c. 


(3.4) 


and  this  gives  the  result 


N-\ 

n=0 

Any  nontrivial  set  of  functions  satisfying  eq.  3.5  constitutes  an  orthonormal 
basis  for  the  linear  vector  space.  Hence  {x(n)}  can  be  uniquely  represented  as 


where 


N-l 


4(n)=E  *A("). 

k=0 


O^n^N-l 


N-l 

xin)^*(n),  O^s^N-1  (3.7) 

n=0 

The  set  of  coefficients  {X^,  0<  s<  N-l}  constitute  the  spectral  coefficients  of  (x(n)} 
relative  to  the  given  orthonormal  family  of  basis  functions.  Classically,  these  are 
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called  generalized  Fourier  coefficients  even  when  {0r(n)}  not  sinusoidal.  In  this 
dissertation,  they  will  also  be  called  spectral  coefficients  or  weighting  coefficients. 

The  set  of  coefficients  {X^}  in  eq.  3.7  also  provides  the  least-square 
approximation  to  {x(n)}.  Suppose  we  want  to  approximate  {x(n)}  by  a superposition 
of  the  first  L of  the  N basis  sequences,  using  weighting  coefficients  {Yj,  i=0,  1,...,  L- 
1}.  Then  the  least-squares  estimates  for  these  coefficients  are 


Y.=X.,  i=0,  1,  2,  L-1 


The  proof  is  as  follows.  Let  the  approximation  be 


L-l 

x'(n)=Y:  W”) 

*=o 


and  the  error  is  then 


(3.9) 


t(n)=x(n)-x'(n) 

The  {Yj}  are  to  be  chosen  to  minimize  the  sum  of  squared  errors 


(3.10) 


|e(n)P 

n=0 

Expanding  the  latter  and  invoking  orthonormality  gives 


(3.11) 


n=0  n=0  \r=Q  y r=0 

For  convenience,  x(n),  Yj  and0^n)  are  assumed  to  be  real.  Next,  setting  the  partial 
derivative  of  8 l with  respect  to  Y^.  to  zero  gives 


de 


dY. 


\ 


/N-l 

- = -2  5^  Jf(n)<l)yn) 

\n=0  y 


-27^=0 


(3.13) 


with  solution 
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N-l 

y!  s=0,  1,  2,  - ,L-1  (3-14) 

n=0 

By  replacing  Y^,  r=0,  1,  2,  L-1,  in  eq.  3.12  with  eq.  3.14,  we  can  show  that  8^=0. 

Thus  the  sum  squared  error  in  using  x’(n)  as  an  approximation  to  x(n)  is  minimized 
by  selecting  the  weights  to  be  the  orthonormal  spectral  coefficients. 


3.3  Nonincreasing  Data  Error  with  More  Spectral  Coefficients 
If  the  coefficients  are  chosen  according  to  eq.  3.14,  we  can  show  that  the  sum 
of  squared  errors  cannot  be  increasing  when  more  spectral  coefficients  are  used. 
Equivalently,  we  want  to  show  CL+m  < 8l,  where  m>l  and  L+m<N. 

From  eqs.  3.9,  3.10,  and  3.11,  we  can  write 


N-l 

^L*m  ^ ^ 
n=0 


I+m-1 


m-  E W") 

k=0 


Replacing  x(n)  with  eq.  3.6,  eq.  3.15  can  be  rewritten  as 


N-l 

^L+m  ^ ^ 


This  will  lead  to 


E W")! 


/i=0  k=L+m 


(3.15) 


(3.16) 


7^-1  / N-l 


N-l 


\ 


E W«)  E W(«) 


n=0  \r=I+m 

By  invoking  the  orthonormality,  we  have 


s=L+m 


/ 


Thus, 


7^-1 


-E  |if 

r=L+m 


(3.17) 


(3.18) 


I+m-1 


^L*m 


- E Ft 

r=L 


^0 


(3.19) 


The  proof  is  complete. 

The  analysis  above  suggests  that  if  spectral  coefficients  Yq,  Yj,  ...,  Yl.i  are 
used  with  the  orthonormal  function  0^(n),  r = 0,  1, ...,  L-1,  to  approximate  the  sample 
data,  as  the  number  of  coefficients  L increases  the  sum  of  squared  errors  between 
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the  sample  and  the  approximated  data  values  either  decreases  or  remains  the  same 
(i.e.,  will  not  increase).  Also,  the  results  suggest  that  when  the  number  of  spectral 
coefficients,  L,  equals  the  total  number  of  sample  data,  N,  the  sum  of  squared  errors 
vanishes.  These  are  important  mathematical  properties  which  are  used  in  the 
progressive  transform  coding. 

Now,  suppose  we  are  to  use  only  a selected  number  of  spectral  coefficients  to 
approximate  the  sample  data  and  disregard  the  rest  of  the  coefficients.  For  example, 
in  one  approximation  we  select  only  three  coefficients,  Yj,  Y4,  and  Y7,  and  treat  the 
rest  of  the  coefficients  as  they  have  zero  coefficients  values,  and  in  another 
approximation  we  select  5 coefficients  Yq,  Y^,  and  Y7.  Under  such  selection 

processes,  it  is  no  longer  guaranteed  that  approximations  performed  using  more 
spectral  coefficients  would  not  be  worse  than  approximations  with  less  number  of 
coefficients  in  terms  of  their  sum  of  squared  errors.  For  example,  the  approximation 
using  the  5 selected  coefficients,  Yg,  Yj,  Y3,Y5,  and  Yj,  is  not  guaranteed  to  perform 
no  worse  than  the  approximation  using  three  coefficients,  Yj,  Y4,  and  Yj.  In  fact,  we 
can  show  that  under  such  selection  processes  the  approximation  performed  using 
more  spectral  coefficients  is  guaranteed  to  do  no  worse  than  the  approximation  with 
less  number  of  coefficients  only  if  the  coefficients  used  in  the  latter  approximation 
is  a subset  of  the  coefficients  used  in  the  former  approximation.  What  follows  are 
some  mathematical  proofs. 

Consider  a new  definition  of  x’(n)  as  follows. 

k=a 

where  a>0.  With  this  new  definition, 

'I'  E (3.21) 

r6[0,a-l] 

re[L^a^-l] 

^L+m  is  defined  as  in  eq.  3.18.  If  m>a, 
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E F/f  (3.22) 

re[0^-l] 

relL*aJ,+m-l] 

If  m = a, 

E r/  ^0  (3-23) 

re[0^-l] 

If  m<a  and  L+m>a-l,  we  have 

L \ri-  E n 

re[0^-l]  re[I+m»I+j-l] 

If  m<a  and  L+m<a-l,  we  have 

H-H..-  E \yf-  E 

re[0^+/«-l]  re[o4'^a-l] 

Based  on  eqs.  3.22  to  3.25,  we  can  say  that  if  m>a,  Cl+h,  < El  is  always  true. 
However,  if  m<a,  CL+m  ^ El  is  not  guaranteed.  In  general,  to  guarantee  El+h,  ^ 
the  L+m  coefficients  must  contain  all  the  L coefficients.  That  is,  the  L coefficients 
for  calculating  El  must  be  a subset  of  L+m  coefficients  for  calculating  EL+m- 

The  above  analysis  implies  that  one  decoder  which  receives  less  number  of 
spectral  coefficients  can  actually  approximate  sample  data  better  than  another 
decoder  which  receives  more  spectral  coefficients.  This  is  a reason  why,  in  many 
image  data  processing  applications,  "which  spectral  coefficients  to  use"  is  a more 
important  question  to  ask  than  "how  many  spectral  coefficients  to  use"  in  achieving 
a good  data  reconstruction.  To  put  it  another  way,  different  spectral  coefficients  play 
different  roles  in  and  make  different  contributions  to  the  data  reconstruction  process. 
Some  spectral  coefficients  have  more  weights  than  the  others  in  achieving  a good 
data  reconstruction.  In  the  progressive  transform  coding,  it  is  important  that  the 
coefficients  with  more  weights  be  transmitted  before  those  coefficients  with  less 
weights.  Under  this  coefficient  transmitting  strategy,  for  some  image  data,  it  is 
possible  to  achieve  a good  data  reconstruction  using  just  the  first  few  transmitted 
spectral  coefficients.  An  optimal  transmission  sequence  of  the  coefficients  is  therefore 
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the  ranking  order  of  the  weight  of  each  coefficient.  The  next  question  that  needs  to 
be  addressed  is  how  do  we  determine  the  weight  of  each  spectral  coefficient. 

3.4  Optimal  Transmission  Sequence  of  the  Coefficients 
In  this  section,  we  will  show  that  arranging  the  coefficients  in  descending  order 
of  their  magnitudes  is  the  optimal  transmission  sequence  in  terms  of  minimizing  the 
sum  of  squared  errors.  This  is  at  least  intuitively  correct  if  we  consider  the  energy 
preserving  property  of  the  orthonormal  transformation,  which  asserts  that 


JV-l 


N-l 


£ (3.26) 

n=0  it=0 

This  is  the  Parseval  theorem,  which  can  be  derived  from  eq.  3.6.  Spectral  coefficients 
with  large  values  correspond  to  high  energy.  So  a coefficient  with  larger  magnitude 
is  more  important  than  that  with  smaller  one  in  terms  of  energy  preserving  capability. 

A formal  proof  of  the  above  assertion  is  given  as  follows.  Let  e ; to  be  the  sum 
of  squared  errors  of  the  data  approximation  given  only  one  coefficient  Xj,  i.e.. 


7^-1 


N-l 


E W" 


'rE  W'’)'*,4i(n)l^-E 

n=0  n=0  }/c=0Jc*i 

Expanding  eq.  3.27  and  invoking  the  orthonormality  gives 


N-l 


1 


(3.27) 


Similarly,  we  have 


N-l  N-l 

'rE 

k=0Mi 


k=0 


(3.28) 


'rE  <3.29) 

*=0 

where  Cj  is  the  sum  of  squared  errors  of  the  data  approximation  given  only  one 
coefficient  Xj.  For  a data  sequence  x(n),  it  is  easy  to  see  (by  observing  eqs.  3.26,  3.28, 
and  3.29)  that 


e,.^e. 


for  any  i,j  e [0,N-1]  and  i^^j.  This  completes  the  proof. 


(3.30) 
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Given  a set  of  data  Xj(n)  we  can  always  find  their  spectral  coefficients  and 
determine  the  optimal  transmission  sequence  Sj  according  to  the  descending  order 
of  the  coefficient  magnitudes.  However,  for  another  set  of  data  X2(n)?‘Xi(n),  its 
optimal  transmission  sequence  $2  may  not  be  the  same  as  Sj.  In  other  words,  the 
optimal  transmission  sequence  is  data  dependent.  Therefore,  the  position  information 
of  the  coefficients  in  the  optimal  sequence  needs  to  be  coded  and  sent  to  the  decoder 
as  an  overhead  when  a new  set  of  data  is  processed.  This  overhead  may  be  too  large 
to  be  practical.  In  practice,  a suboptimal  but  data  independent  transmission  sequence 
is  often  used  in  image  coding  applications.  One  of  the  examples  is  the  zigzag 
sequence  shown  in  Fig.  2.5.  There  seems  to  be  a parallelism  between  optimal 
transform  and  the  optimal  transmission  sequence.  Both  are  data  dependent  and  both 
are  replaced  by  data  independent  counterparts  in  practical  applications. 

3.5  Optimal  Progressive  Transform 

The  unitary  transformation  that  completely  decorrelates  the  data  and  achieves 
the  maximum  energy  compaction  is  the  optimal  transform  --  Karhunen-Loeve 
transform  (KLT)  (Rao  and  Yip[1990]). 

Consider  a 1-D  unitary  transformation  i such  that 

Y=^*X,  X=^*Y  (3.31) 

where  X— [xq  , Xj  , X2  , ...,  and  Y— [yQ  > yi » y2  > •••>  Yn-i]  the  data  in  spatial 
domain  and  transform  domain,  respectively.  The  spatial-domain  data  are  always  real. 
However,  the  transform-domain  data  can  be  complex.  Let  the  orthonormal  sequences 
0r(k)  be  the  rows  of  a transformation  matrbc,  0 (r,k),  then  # = [0(r,k)].  Now  let  be 
a column  vector  representing  the  basis  sequence  {0,(k)},  i.e.,  = [0^(0),  0XI), 

0r(N-l)]‘.  X can  be  written  as  a weighted  sum  of  these  basis  vectors. 
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X= 
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(3.3 
E yr^r  2) 

r=0 


Let  the  truncated  representation  of  X be  X, 


L> 


(3.33) 


r=0 


The  mean  squared  error  introduced  by  the  truncation  in  eq.  3.33  is  given  by 


=E 


\ 


/J 


(3.34) 


z=E[{X-Xj)XX-XJ\ 

fN-i  y/^-1 
E>'r^r 

/ \''=i 

■E  ^."1 

r=L 

Where  X is  assumed  to  be  a real  random  variable  and  has  zero  mean.  Since  y,= 

#;x. 


N-\ 

'"E  4’*’. 

=E 

r=L 


(3.35) 


where  Rxx  is  the  auto-covariance  matrix  of  the  vector  X. 

To  obtain  the  optimum  transform,  we  want  to  find  the  basis  functions  that 
minimizes  e for  a given  L,  subject  to  the  orthonormality  constraint,  Using 

the  Lagrangian  multipliers,  we  minimize 


•'-E  - MV*.  - 1)1  <3-3«) 

r=L 

Taking  the  gradient  of  each  term  in  the  sum  with  respect  to  one  obtains 


~ 0 =*  ^xx^r  ~ 


which  implies 
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where 

A — A|,  ...  jA^_j) 

Hence  is  an  eigenvector  of  the  covariance  matrix  Rxx,  and  the  associated 
eigenvalue,  is  a root  of  the  characteristic  polynomial,  | - R^xl  • Since  R^x  is  a 

real,  symmetric  matrix,  all  { A^}  are  real,  distinct,  and  nonnegative.  The  minimized 
e is  then 

- i!  = E <3.38) 

r=I  r=L 

Thus,  e is  minimized  by  ranking  the  eigenvalues  A,  in  a descending  order.  The  set 
of  basis  vectors  obtained  will  diagonalize  the  auto-covariance  matrix  Rxx»  ^s  can  be 
seen  from 

Ryy  = = A (3.39) 

In  conclusion,  # is  the  unitary  matrix  for  the  KLT  that  generates  a diagonal 
Ryy  and  thus  completely  decorrelates  the  transformed  coefficients.  In  addition,  it 
repacks  the  total  signal  energy  among  the  first  L coefficients.  By  comparing  eq.  3.38 
and  eq.  3.34,  we  have 

X^Ely^  (3.40) 

Since  e is  minimized  by  ranking  the  eigenvalues  A^  in  a descending  order,  e is  also 

minimized  by  ranking  the  magnitudes  of  the  coefficients  y,  in  a descending  order. 

Recall  from  the  result  of  previous  section,  such  a coefficient  sequence  is  the  optimal 

transmission  sequence  of  the  coefficients  in  progressive  data  approximation. 

Therefore,  the  KLT  is  both  the  optimal  transform  and  the  optimal  progressive 
transform. 

It  is  noted  that  while  many  matrices  can  decorrelate  the  input  signal,  the  KLT 
both  decorrelates  the  input  perfectly  and  optimized  the  repacking  of  signal  energy 
(Akansu  and  Haddad  [1992]).  However,  it  is  also  clear  that  the  basis  functions  are 
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dependent  on  the  auto-covariance  matrix  Rxx  and  therefore  cannot  be 
predetermined.  Furthermore,  given  an  auto-covariance  matrix,  finding  the  basis 
functions  for  the  KLT  is  usually  computationally  involved.  For  all  these  reasons,  the 
KLT  is  seldom  used  in  practice.  Instead,  suboptimal  but  data  independent  transforms 
such  as  DCT  is  often  used. 

In  the  next  chapter,  the  principles  of  the  proposed  fast  and  smooth  progressive 
inverse  transforms  are  discussed.  The  proposed  approach  exploits  the  symmetrical 
property  of  the  basis  functions  to  reduce  the  computation  of  inverse  transforms.  How 
the  symmetry  of  the  basis  functions  is  exploited  and  what  transforms  have 
symmetrical  basis  functions  are  two  major  issues  discussed  in  the  next  chapter. 


CHAPTER  IV 

FAST  AND  SMOOTH  PROGRESSIVE  INVERSE  TRANSFORMATION 

4.1  Introduction 

In  the  progressive  transform  coding,  the  encoder  performs  basically  the  same 
tasks  as  that  in  the  sequential  transform  coding.  One  pass  of  encoding  process  is 
sufficient  to  generate  all  the  transformed  coefficients  for  progressive  transmission. 
However,  in  the  progressive  coding,  the  coefficients  are  divided  into  several  groups. 
For  convenience,  we  call  one  group  of  the  transformed  coefficients  a coefficient 
packet.  Each  packet  can  have  different  numbers  of  coefficients.  For  a given 
coefficient  packet,  one  pass  of  decoding  process  is  required.  One  group  of  the 
coefficients  is  sent  for  one  stage  of  data  refinement  or  approximation.  Since  there  are 
more  than  one  coefficient  packets  in  the  progressive  transmission,  multiple  passes  of 
decoding  process  are  required. 

The  multiple-pass  decoding  process  is  the  unique  feature  of  the  progressive 
coding.  In  contrast,  the  traditional  sequential  coding  requires  only  one  pass  of 
decoding  process.  This  unique  feature  allows  a progressive  data  reconstruction,  which 
has  many  practical  applications.  However,  there  is  a side  effect  associated  with  it. 
That  is,  the  increase  of  processing  burden  at  the  decoder  side.  Roughly  speaking,  if 
m-pass  decoder  is  used,  there  will  be  an  m-fold  increase  of  processing  complexity. 

A typical  transform-based  decoder  usually  consists  of  three  major  modules. 
They  are  entropy  decoding,  dequantization,  and  inverse  transformation.  Among  them, 
the  inverse  transformation  is  the  most  time-consuming  operation  as  far  as  the 
computational  complexity  is  concerned.  An  attempt  to  reduce  the  complexity  of 
inverse  transforms  in  progressive  decoding  for  image  data  was  first  made  by 
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Takikawa  [1983]  and  later  by  Miran  and  Rao  [1990].  Miran  and  Rao  observed  that 
both  Takikawa’s  and  their  own  algorithms  create  poor  and  discontinuous  image  built 
up.  Therefore,  their  algorithms  are  not  desirable  for  practical  use.  There  is  a need 
to  reduce  the  complexity  of  inverse  transforms  in  progressive  decoding  while 
maintaining  a good  performance  of  data  reconstruction.  Miaou  and  Tou  [1993a, 
1993b,  1993c]  proposed  a new  approach  that  is  desirable  for  practical  use.  Although 
Miaou  and  Ton’s  approach  is  dedicated  to  handling  the  DCT  image  data , their  idea 
can  be  extended  to  deal  with  other  types  of  data  and  transforms  under  a generalized 
framework  discussed  in  this  chapter. 


4.2  Traditional  Fast  Inverse  Transformation 
In  this  section,  the  general  principle  of  traditional  fast  transforms  is  first  given. 
Next,  some  inherent  drawbacks  associated  with  these  transforms  for  progressive 
coding  are  discussed. 

4.2.1  A General  Principle  of  Traditional  Fast  Transforms 

Let  X be  a N-component  vector  representing  one  dimensional  data.  Further, 
let  # be  any  N X N unitary  transform  matrbc  consisting  of  orthonormal  row  or  column 
vectors.  Also,  let  Y be  the  vector  of  transformed  coefficients.  Then, 


Y=^*X 

Alternatively,  it  can  also  be  written  as 


3^0 

4>00  <1>01 

1 
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... 

— 

4>io  4>ii  - 
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• • • 
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^N-l_ 

In  addition,  if  we  define  to  be  [0jo0ii 0i,N-i  ],  then 


(4.2) 
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>^0 

3’i 


From  eq.  4.1  and  the  fact  that  # 


■Vjv-i 


K-x 


'’N-l 


is  unitary,  we  have 


3^0 


yN-i 


Eq.  4.4  can  be  expressed  in  terms  of  a signal  flowgraph  as  shown  in  Fig.  4.1(a).  There 
are  N"  branches  in  the  signal  flowgraph,  where  a label  is  associated  with  each  branch. 
This  label  is  usually  called  a gain  which  is  defined  as  the  output  level  at  the  right  end 
of  the  branch  over  the  input  level  at  the  left  end  of  the  branch.  In  addition,  one  gain 
corresponds  to  one  element  in  matrix  $.  We  call  a gain  associated  with  a branch 
nontrivial  if  a nontrivial  multiplication  is  required  to  compute  the  output  of  that 
branch  given  a nontrivial  input.  Otherwise,  it  is  trivial.  For  example,  the  gain  of  1 is 
trivial.  Traditional  fast  inverse  transforms  will  try  to  minimize  the  number  of  the 
branches  whose  gains  are  nontrivial.  The  principle  to  perform  traditional  fast 
transforms  can  be  explained  by  an  example.  Consider  a simplified  problem  when 
N=2.  In  this  case,  eq.  4.4  can  be  written  as 


^00 

4>io 

yo 

(4.5) 

•*1. 

<1>11 

m 

yx. 

or  equivalently. 
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■^0  ^0^00 10 
•^1  ~3'o^oi 


(4.6) 


If  000  and  0jo  are  not  independent,  in  other  words,  if  there  exists  a simple  relation 
between  them  such  that  one  of  them  can  be  derived  from  another  with  an  operator 
P.  This  operator  can  perform  many  possible  operations  except  a nontrivial  real  or 
complex  multiplication.  For  example,  P can  be  a complex  number  j,  +1,-1,  etc.  The 
operator  is  dependent  on  the  actual  transforms  under  consideration.  Some  transforms 

(e.g.  the  KLT)  do  not  possess  such  a simple  operator.  Therefore,  in  general,  they  do 
not  have  fast  transforms. 

For  the  particular  example  shown  in  eq.  4.6,  let  0io  = P0oo  and  0„  = Q0oi  , 
where  P and  Q are  simple  operators  involving  essentially  no  real  or  complex 
multiplications.  Then,  eq.  4.6  can  be  written  as 


Two  additions  and  four  multiplication  are  associated  with  eq.  4.6  while  only  two 
additions  and  two  multiplications  are  required  in  computing  eq.  4.7.  Thus,  eq.  4.7  can 
be  computed  faster  than  eq.  4.6.  In  this  case,  we  call  eq.  4.7  a fast  transform  of  eq. 
4.6.  If  we  define  Zq  and  Zj  as 


we  can  get  the  signal  flowgraph  for  eq.  4.7  as  shown  in  Fig.  4.2(b).  The  signal 
flowgraph  of  eq.  4.6  is  shown  in  Fig.  4.2(a)  for  comparison.  Note  that  Fig.  4.2(b)  has 
one  more  stage  than  that  of  Fig.  4.2(a).  In  addition,  the  number  of  branches  with 
nontrivial  gains  have  been  reduced  from  four  in  Fig.  4.2(a)  to  two  in  Fig.  4.2(b).  A 
matrbc  representation  corresponding  to  Fig.  4.2(b)  is 


^0  =>'o4>oo  =(yo  +>'i^)4>oo 

=3'o<l>oi  =(yo+>'i<?)<l>oi 


(4.7) 


Zo=yo*yiP,  Zi=yo+yiQ 


(4.8) 
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By  comparing  eq.  4.5  and  4.9,  we  have 


(4.9) 


^(X)  ^10 

4^00  ® 

1 p 

4>oi  <1^11 

® 4^01. 

1 Q. 

(4.10) 


In  general,  for  N > 2,  the  signal  flowgraph  associated  with  the  fast  transform 
will  have  more  than  one  intermediate  stage.  This  is  shown  in  Fig.  4.1(b). 


4.2.2  Inherent  Drawbacks  for  Progressive  Inverse  Transforms 

For  progressive  inverse  transformation,  the  transformed  coefficients  are 
divided  into  several  coefficients  packets  and  only  one  packet  is  sent  to  the  channel 
for  one  stage  of  data  reconstruction.  The  coefficients  in  a packet  are  usually  sent  and 
received  in  a sequential  manner.  Given  one  coefficient  packet,  the  inverse  transform 
is  performed  to  have  one  stage  of  data  reconstruction.  An  inverse  transform  is 
performed  again  after  receiving  another  coefficient  packet.  Continued  in  this  way 
until  a desired  output  is  produced.  The  nature  of  coefficient  splitting  and  sequential 
transmission  in  progressive  inverse  transformation  leads  to  the  following  inherent 
drawbacks  of  traditional  fast  algorithms. 

Delay  time  consideration.  The  signals  at  the  inputs  of  a multi-stage  signal 
flowgraph  cannot  race  with  each  other  all  the  way  to  the  outputs  if  any  intermediate 
node  of  the  flowgraph  has  two  or  more  incoming  branches.  For  convenience,  these 
nodes  are  called  summation  nodes  because  the  information  in  that  node  cannot  be 
passed  to  the  node(s)  in  the  next  stage  until  a summation  of  all  incoming  signals  is 
performed  to  provide  the  correct  information.  The  signal  flowgraphs  associated  with 
fast  transforms  usually  have  many  summation  nodes.  This  is  understandable  because 
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Fig.  4.1  Signal  flowgraphs  of  a transform:  (a)  based  on  the  definition  of  the 
transform  and  (b)  its  fast  transform. 
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(b) 


Fig.  4.2  A special  case  of  Fig.  4.1  (N  = 2):  (a)  based  on  the  definition  of  the 
transform  and  (b)  its  fast  transform. 
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basically  all  fast  transforms  exploit  the  associative  property  of  numbers,  i.e, 
C(A+B)=AC+BC.  In  this  case,  the  node  taking  the  summation  of  A and  B is  a 
summation  node.  If  the  inputs  to  the  signal  flowgraph  are  presented  in  a sequential 
manner,  the  summation  nodes  may  produce  some  delay  because  the  incoming  signals 
to  the  nodes  do  not  arrive  at  the  same  time.  The  delay  can  be  accumulated  as  the 
signals  travel  through  the  flowgraph  and  reach  their  destinations.  The  more 
summation  nodes  a signal  flowgraph  has  the  more  serious  the  accumulated  delay  of 
the  signal  flowgraph  would  be.  The  accumulated  delay  can  be  modeled  by  an 
equivalent  delay  at  the  inputs  of  the  signal  flowgraph  in  which  every  summation  node 
is  delay  free.  That  is,  this  equivalent  delay  is  used  to  account  for  the  sequential 
nature  of  inputs  to  the  signal  flowgraph.  For  example,  an  N-point  fast  transform 
cannot  be  started  until  all  N points  data  are  ready  at  the  inputs  of  the  signal 
flowgraph. 

Computational  redundancy.  The  characteristic  of  the  input  data  (such  as  the 
sparsity)  is  generally  not  considered  in  traditional  fast  algorithms.  If  the  inputs  to  a 
signal  flowgraph  are  sparse,  i.e.,  only  part  of  the  inputs  are  nonzero,  many  signal 
paths  in  the  flowgraph  are  redundant  in  the  sense  that  they  do  not  produce  anything 
to  the  outputs  except  zeros.  As  the  inputs  become  sparser,  this  computational 
redundancy  becomes  more  obvious.  Therefore,  traditional  fast  algorithms  may  not 
be  efficient  in  dealing  with  highly  sparse  matrices. 

4.3  A Novel  Approach  to  Fast  Progressive  Inverse  Transformation 

In  view  of  the  drawbacks  of  traditional  fast  algorithms  discussed  in  the 
previous  section,  an  improved  approach  to  fast  progressive  inverse  transformation  is 
desirable.  The  rationale  of  one  such  approach  is  discussed  in  this  section. 
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4.3.1  Degree  of  Self-Svmmetrv  of  a Vector 

Let  Vn  = [vo  , Vj  , V2 , V[sj.i]  be  an  N-dimensional  row  vector.  Two  distinct 
components  Vj  and  Vj  of  the  vector  are  said  to  have  self-symmetry  with  respect  to 
(i+j)/2  if  they  have  one  of  the  following  relationships:  (1)  V;  = w,  the  two 
components  are  said  to  have  even  self-symmetry  or  simply  self-symmetry;  (2)  Vj  = -w, 
they  are  said  to  have  odd  self-symmetry  or  self-antisymmetry;  (3)  Vj  = w*  (vj  is  the 
conjugate  of  Vj  and  vice  versa),  they  are  said  to  have  conjugate  self-symmetry;  and 
(4)  Vj  = -Vj , they  are  said  to  have  conjugate  self-skewsymmetry.  In  all  four  cases,  Vj 
and  Vj  are  said  to  be  a self-symmetric  pair.  For  ease  of  exposition,  they  will  be 
referred  simply  as  symmetry,  antisymmetry,  conjugate  symmetry,  and  conjugate 
skewsymmetry,  respectively. 

If  N is  even,  let  Vn/2^^^  = [vo  , Vj  , V2  , ...,  Vn/m]  and  = , Vn/2+i  , 

''n/2+2  » -1  Vn.i]  be  vectors  consisting  of  the  left  part  and  the  right  part  of  V^,, 
respectively.  If  Vj  and  v^.i.!  form  N/2  symmetric  pairs  for  Q<i<N/2-l,  is  said  to 
have  N/2-degree  of  symmetry.  Further,  if  V;  and  VN/2.i.i  form  N/4  symmetric  pairs  for 
Q<i<N/4-l,  Vn/2^^^  is  said  to  have  N/4-degree  of  symmetry.  Using  the  same  rule,  in 
general  a 2p-dimensional  vector  is  said  to  have  p-degree  of  symmetry  if  its 
components  Vj  and  V2p.j.i  form  p symmetric  pairs  for  Q<i<p-1. 

Similarly,  if  N is  odd,  Vn/2^‘'>  = [vo  , Vj  , V2 , ...,  v^^.d/^]  and  Vn/2^'^^  = [V(n.i)/2m 
» ''(n-i)/2+2  > V(n_i)/2+3  , •••,  Vn_i].  Notc  that  the  component  V(n.i)/2  ‘s  not  in  Vn/2*^^^  nor 
in  y^/2^^^  when  N is  odd.  In  this  case,  a (2p+l)-dimensional  vector  is  said  to  have 
p-degree  of  symmetry  if  its  components  Vj  and  V2p_i  form  p symmetric  pairs  for  Q<i<p- 
1. 

The  above  definitions  can  be  conveniently  expressed  in  matrix  notation  by 
introducing  the  counter-identity  matrix  J 
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1 

0 


(4.11) 


For  any  matrix  R,  RJ  will  rearrange  the.  column  vectors  of  R in  a reverse  order.  On 
the  other  hand,  JR  will  rearrange  the  row  vectors  of  R in  a reverse  order.  Therefore, 
J is  simply  an  order-reversing  operator.  The  dimension  of  J will  be  omitted  for 
convenience  and  it  has  the  same  dimension  as  the  matrix  to  which  J is  applied. 

With  the  J operator  defined  in  eq.  4.11,  we  can  use  a short  hand  notation 
Vn  = [Vn/2^^^  = [Vn/2^^^  Vn/2^^^J]  to  indicate  that  V^,  has  N/2-degree  of  even 

symmetry  (assuming  N is  even).  Similarly,  we  use  Vn=[Vn/2*^^^  - 

to  indicate  that  has  N/2-degree  of  antisymmetry  (assuming  N is  even). 


4.3.2  Progressive  Approximation  in  Inverse  Transformation 

In  this  section,  we  will  formalize  the  general  theory  for  progressive 
approximation  of  the  original  data  given  a sequence  of  the  transformed  coefficients. 
For  convenience,  we  will  discuss  one  dimensional  data  first.  The  extension  to  two  and 
higher  dimensional  data  is  straightforward  since  multidimensional  transforms  can  be 
performed  by  the  repeat  use  of  one  dimensional  transforms. 

From  eq.  4.4,  we  have 


X=(<^*)'*T=<I>T 


(4.12) 


To  perform  the  data  reconstruction  as  smooth  as  possible,  we  use  only  one 
coefficient  at  a time  for  data  approximation.  For  convenience,  we  assume  that  the 
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transmission  sequence  of  the  coefficients  is  yg,  yj, yN.i-  This  transmission  sequence 
may  not  be  optimal  as  discussed  in  Chapter  III.  Let  Xj  be  the  i-th  approximation  of 
the  original  data  X when  the  transformed  coefficient  y;  is  received  by  the  decoder. 
Then, 


Xo=yo< 

=-^0  ^yi  ® 1 (4.13) 

Vi  =yo^o  ^yi  +•••  = V2 

If  quantization  or  round-off  effect  is  not  considered,  then  X^.j  = X.  Note  that  each 
approximation  requires  N multiplications  for  ^ and  N additions  for  Xj+yj^j'.  The 
number  of  multiplications  can  be  reduced  if  $j  possesses  some  degree  of  symmetry. 
For  example,  assume  i j has  even  symmetry  and  N is  even.  Then,  $ j can  be  divided 
into  a left  and  a right  part.  If  $ jj  and  $ are  used  to  denote  the  left  and  the  right 
parts  of  § j,  respectively,  then 

where  J is  the  N/2  by  N/2  counter-identity  matrix  defined  in  eq.  4.11.  Since  J is 
simply  an  order-reversing  operator,  it  requires  no  multiplication  or  addition.  From 
eq.  4.14,  the  number  of  multiplications  can  be  reduced  from  N to  N/2. 

If  $j  is  odd  symmetric,  eq.  4.14  can  be  rewritten  as 


(4.15) 


In  this  case,  no  additional  computations  are  needed  except  for  the  sign  change. 


If  N is  odd,  then  eq.  4.14  can  be  rewritten  as 

^«,(Ar-iy2  ^i,(N-l)p. 

Similarly,  eq.  4.16  can  be  written  as 


(4.16) 


[^iZ,  ^i,(N-l)p.  (4-17) 

Thus,  when  N is  odd,  the  required  number  of  multiplications  is  reduced  from  N to 
(N+l)/2. 
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The  number  of  multiplications  can  be  reduced  further  if  $ also  possesses  the 
symmetry.  Following  this  logic,  the  least  possible  number  of  multiplications  for 
is  simply  one. 

Now,  let’s  extend  the  above  discussion  to  the  two  dimensional  case. 


=[*' 


^00  ^01 
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•^0,^-1 
y 1,^-1 


ys-ifl  ■" 
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N-l 


(4.18) 


In  the  one  dimensional  case,  y^^  ;*  is  considered  for  the  i-th  approximation  of  the 
original  data.  In  the  two  dimensional  case,  yjj^/fj*  is  the  required  computation  for 
one  stage  of  approximation  of  the  original  data.  Let  us  consider  only  the  cases  where 
N is  even  and  $ j has  even  symmetry.  The  rest  of  the  cases  can  be  discussed  in  the 
same  manner  as  in  the  one  dimensional  case. 

Suppose  has  even  symmetry  but  does  not  have  any  symmetry.  Then  we 
can  rewrite  as 


iX 


o' 


iR 


^JL  ^JR 


♦ 


(4.19) 


Note  that  half  of  the  computation  can  be  saved  in  this  case.  Suppose  both  and 
are  even  symmetric,  then 


r 4c  4c  " 

A A ^ 

« t 
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% 

(4.20) 


Three  quarters  of  the  computation  can  be  saved  in  this  case.  Just  as  in  the  one 
dimensional  case,  if  §jL  or  itself  also  possesses  the  symmetry,  further 
computational  saving  is  possible.  Following  this  logic,  the  least  possible  number  of 
multiplications  for  yjj§j‘$j*  is  simply  one. 
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The  number  of  additions  can  also  be  reduced  by  the  use  of  symmetry.  Let’s 
start  from  the  one  dimensional  case  again.  Consider  the  following  summation 

yM^yj^j  (4.21) 

If  both  §j  and  have  the  same  degree  of  symmetry,  say  N/2-degree  of  symmetry  and 
N is  even,  then 


If  both  §i  and  $=  have  the  same  type  of  symmetry,  i.e.,  a =)0,  then 


yM^yj<^r 


(4.22) 


yi^‘iL^yj^jL 

i-iyj(^i^'iL+yj^jL\ 


(4.23) 


Thus,  eq.  4.21  also  has  the  same  degree  and  type  of  symmetry.  This  result  can  be 
extended  to  two  dimensional  case.  Consider  the  following  summation 


V-.^>-0*+v  (4-24) 

J\j  I j Jmn  m^n  ' ^ 

Suppose  $j,§j,  srid  all  have  the  same  degree  of  symmetry  of,  say,  N/2-degree 
and  N is  even,  then  by  the  use  of  eq.  4.20,  we  have 


I J •'mn  m^n 


( - !)•  • (-1)“  * 


mn  ffiL^nL 

If  a +;3  =A+B,  the  right  hand  side  of  eq.  4.25  can  be  simplified  to 


(4.25) 


(4.26) 


In  this  case,  the  number  of  additions  has  been  reduced  from  NxN  to  NxN/4.  If 
yjj$i$j  and  ymn^m'^n  ^Iso  have  N/4  degree  of  symmetry,  then  the  required  number 
of  addition  will  be  NxN/ 16.  In  other  words,  further  reduction  of  addition  is  possible 
if  both  operands  for  addition  have  lower  degrees  of  symmetry. 
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Based  on  the  above  analysis,  we  conclude  that  the  number  of  additions  can 
not  always  be  reduced  unless  some  conditions  are  met.  First,  two  operands  for 
addition  must  have  the  same  type  of  symmetry.  Second,  they  must  have  the  same 
degree  of  symmetry. 

4.3.3  Signal  Flowgraoh  Decomposition  and  Reduction 

To  develop  a signal  flowgraph  for  the  proposed  approach,  we  use  the  concept 
of  signal  flowgraph  decomposition  and  reduction. 

The  signal  flowgraph  corresponding  to  eqs.  4.4  was  shown  earlier  in  Fig. 
4.1(a).  Since  eq.  4.4  can  be  written  as  eq.  4.12,  the  signal  flowgraph  corresponding 
to  eq.  4.12  is  functionally  equivalent  to  that  of  eq.  4.4.  The  signal  flowgraph 
corresponding  to  eq.  4.12  is  shown  in  Fig.  4.3.  By  comparing  Fig.  4.1(a)  and  Fig.  4.3, 
we  say  that  the  signal  flowgraph  of  Fig.  4.1(a)  has  been  decomposed  into  functionally 
equivalent  set  of  subgraphs  as  shown  in  Fig.  4.3. 

Each  subgraph  in  Fig.  4.3  corresponds  to  an  orthonormal  basis  vector  for  the 
unitary  transform  under  consideration.  The  subgraph  has  one  input  node,  N output 
nodes  and  no  intermediate  nodes.  Some  output  nodes  are  redundant  in  the  sense  that 
they  can  be  derived  by  other  output  nodes  because  of  the  symmetrical  property 
associated  with  their  basis  vectors.  Those  redundant  output  nodes  and  the  branches 
connected  to  them  can  be  eliminated  from  the  subgraph  without  losing  any  useful 
information.  If  a basis  vector  has  multiple  degrees  of  symmetry,  several  ways  of 
reducing  the  subgraphs  are  possible.  However,  the  one  that  producing  the  least 
number  of  output  nodes  is  of  special  interest  to  us.  In  this  way,  the  subgraph  with  the 
least  number  of  output  nodes  is  irreducible  in  the  sense  that  further  reduction  will 
cause  the  loss  of  useful  information.  Such  an  irreducible  subgraph  is  called  a 
primitive  subgraph  for  convenience.  Once  every  subgraph  has  been  reduced  to  its 
primitive  form,  the  union  of  all  these  primitive  subgraphs  are  the  resulting  signal 
flowgraph  for  the  calculation  of  eq.  4.12. 
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Fig.  4.3  A decomposed  signal  flowgraph. 
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Since  some  output  nodes  of  the  original  subgraph  have  been  eliminated,  using 
primitive  subgraphs  will  produce  only  part  of  the  desired  result.  Thus,  a process  is 
required  to  recover  the  remaining  parts  of  the  desired  result.  Again,  this  can  be  done 
by  the  use  of  symmetrical  property.  In  fact,  this  process  is  simply  a reverse  process 
of  producing  primitive  subgraph  from  its  original  subgraph.  If  the  process  to  derive 
the  primitive  subgraph  from  the  original  subgraph  is  called  a reduction  process,  then 
the  process  to  recover  the  remaining  parts  of  the  desired  result  can  be  called  a 
expansion  process. 

The  primitive  subgraphs  for  one  dimensional  transforms  can  be  cascaded  to 
form  the  new  set  of  primitive  subgraphs  for  two  dimensional  transforms.  From  eq. 
4.18,  we  have 


N-l  N-l 


Eq.  4.27  can  also  be  written  as 


i=o  y=o 


(4.27) 


N-l  (N-l  \ 


(4.28) 


1=0  \y=o  y 

The  term  inside  the  parenthesis  of  eq.  4.28  has  the  same  form  as  eq.  4.12.  Thus,  the 
primitive  subgraphs  used  for  eq.  4.12  can  also  be  used  for  that  term.  Eq.  4.28  can 
also  be  written  as 


N-i  N-i 

X= 5^  where  w,.= (4.29) 

/=o  y=o 

Eq.  4.29  also  has  the  same  form  as  eq.  4.12.  Thus,  the  primitive  subgraphs  used  for 
eq.  4.12  can  also  be  used  for  eq.  4.29.  The  primitive  subgraph  with  input  Xj  and  the 
one  with  input  Xj  can  be  cascaded  to  form  the  new  subgraphs  for  two  dimensional 
input  data  yjj.  Since  every  output  of  Wj  is  the  input  to  eq.  4.29,  the  rule  for  cascading 
is:  at  each  output  of  the  first  subgraph,  the  second  subgraph  is  cascaded. 
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4.4  The  Symmetric  Basis  Functions  for  Unitary  Transforms 
In  the  last  section,  we  have  shown  that  symmetric  basis  functions  of  unitary 
transforms  can  lead  to  computational  savings  in  progressive  reconstruction  of  original 
data  from  the  transformed  coefficients.  In  this  section,  we  will  investigate  the 
symmetric  property  of  the  basis  functions  of  many  commonly  used  transforms.  We 
will  show  that  most  of  them  do  possess  the  symmetry  in  their  basis  functions.  Thus 
the  principle  discussed  in  previous  sections  of  this  chapter  is  general  enough  to  cover 
most  of  the  commonly  used  transforms.  The  discussion  will  start  with  the  data 
dependent  transform  — Karhunen-Loeve  Transform  (KLT),  followed  by  three  types 
of  data  independent  transforms  --  sinusoidal  transform,  polynomial  transform,  and 
rectangular  transform. 


4.4.1  Data  Dependent  Transform  - KLT 

As  pointed  out  in  section  3.4,  given  an  auto-covariance  matrix,  finding  the 
basis  functions  for  the  KLT  is  usually  computationally  quite  involved.  Analytical 
solutions  to  this  problem  are  available  for  only  a few  cases.  The  most  frequently 
quoted  one  is  when  the  auto-covariance  matrix  has  the  following  form  (Markov-I 
source): 


1 


P 

1 

P 


• • • 


• •• 


P 

P 

P 


N-l' 

N-2 

N-3 


(4.30) 


where  p is  the  adjacent  correlation  coefficient.  In  this  case,  the  analytic  solution  is 
(Ray  and  Driver  [1970]) 


1 - p 


1 - 2pcos(cop  + p 
where  {coj.}  are  the  positive  roots  of 


(4.31) 
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tan(N<ji^  = 

The  resulting  unitary  matrix  $ is 


- (1  - p^)sm(o)^) 
cos(co^)  - 2p  + pcos(o)^) 


(4.32) 


(|)(r^) 

For  p =0.91  and  N = 8, 


2 

sm 

N + 


0) 


. (r  . l)i 


(4.33) 


0.3266  0.3492 

0.3645 

0.3722 

0.3722 

0.3645 

0.3492 

0.3266 

0.4732  0.4239 

0.2925 

0.1042 

-0.1042  -0.2925 

-0.4239 

-0.4732 

0.4694  0.2183 

-0.1692 

-0.4510 

-0.4510  -0.1692 

0.2183 

0.4694 

0.4278  -0.0756 

-0.4832 

-0.2788 

0.2788 

0.4832 

0.0756 

-0.4278 

0.3655  -0.3411 

-0.3576 

0.3495 

0.3495 

-0.3576 

-0.3411 

0.3655 

0.2878  -0.4863 

0.0914 

0.4151 

-0.4151  -0.0914 

0.4863 

-0.2878 

0.1985  -0.4626 

0.4590 

-0.1896 

-0.1896  0.4590 

-0.4626 

0.1985 

0.1012  -0.2793 

0.4154 

-0.4890 

0.4890  -0.4154 

0.2793 

-0.1012 

and  the  corresponding  eigenvalues  are  6.358,  0.931,  0.298,  0.148,  0.093,  0.068,  0.055, 
and  0.049  (Clarke  [1985]).  Note  that  each  row  vector  of  # is  either  symmetric  or 
antisymmetric.  In  fact,  it  can  be  shown  that 


^(r,k)  = (4-34) 

Therefore,  the  even  row  vectors  (r  = 0,  2,  ...)  are  symmetric  and  the  odd  row  vectors 
(r  = l,  3,  ...)  are  antisymmetric.  Note  that  this  result  does  not  depend  on  N or  p.  In 
fact,  since  the  Rxx  given  in  eq.  4.30  is  a symmetric  Toeplitz  matrix,  it  will  have 
fN/2"]  symmetric  and  [N/2J  antisymmetric  eigenvectors,  where  [a*]  denote  the 
smallest  integer  >a  and  [aj  denote  the  largest  integer  <a  (Cantoni  and  Butler 
[1976]).  The  result  is  true  for  a more  general  class  of  matrices  called  symmetric 
centrosymmetric  (SC)  matrices  where  symmetric  Toeplitz  matrix  is  a subset  of  the 
class  of  SC.  For  a symmetric  matrbc  R to  be  SC  matrk  if  and  only  if  JRJ  = R,  where 
J is  the  counter  identity  matrk  as  defined  in  eq.  4.11. 
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The  covariance  and  autocorrelation  matrices  of  any  stationary  sequence  are 
symmetric  Toeplitz  (Jain  [1989]).  This  can  be  shown  from  the  definition  of 
covariance  and  the  property  of  stationarity.  Therefore,  we  conclude  that  for  any 
stationary  sequence,  its  KLT  will  have  symmetric  basis  functions.  In  practical 
applications  where  stationarity  (or  even  wide  sense  stationarity)  does  not  hold  for  the 
data,  the  basis  functions  of  the  KLT  may  not  have  any  degree  of  symmetry  at  all. 

Next,  we  will  discuss  the  symmetry  of  the  basis  functions  of  data  independent 
transforms  (fixed  transforms)  as  opposed  to  the  data  dependent  transform  (KLT). 
These  transforms  can  be  broadly  categorized  as  sinusoidal  transform,  polynomial 
transform,  and  rectangular  transform  according  to  Akansu  and  Haddad  [1992]. 

4.4.2  Sinusoidal  Transforms 

Discrete  Fourier  transform  (DFT),  discrete  cosine  transform  (DCT),  and 
discrete  sine  transform  (DST)  are  three  major  sinusoidal  transforms.  All  the 
orthogonal  matrices  of  these  transforms  can  be  obtained  from  a unified  approach 
proposed  by  Jain  [1979]. 

If  a nonsingular  matrix  A is  diagonalized  by  a similarity  transformation  S,  i.e., 

S AS  - 

then  its  inverse  is  also  diagonalized  by  S so  that 

We  note  also  that  the  symmetric  Toeplitz  matrix  R^x  for  the  autocovariance 
of  the  Markov-I  signal  (eq.  4.30)  has  an  inverse  in  tridiagonal  form  given  by 
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Rx'x  = (1  - p")-‘ 


1 -p  0 0 


Q. 
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99  9 

• • • 

99  9 

• • • 

9 99 

999 

• « • 

• •• 

««  « 

99  9 

CL 

1 

l+p2 

-f 

• • « 

• • • 

999 

• • • 

-D 

1 

(4.35) 


It  is  evident  that  the  transformation  that  diagonalizes  eq.  4.35  will  also  diagonalize 
the  matrix  R^x-  Instead  of  considering  the  diagonalization  of  eq.  4.35,  which  will  lead 
to  the  KLT  basis  functions,  the  following  parametric  family  of  matrices  was 
considered  by  Jain  [1979]: 


-a  0 0 

-a  1 -a  0 


0 -a  1 -a  0 

...  0 -a  1 -a  

•••  •««  ••• 


(4.36) 


• •• 


ft • • •• 


...  -a  1 


k^a  0 0 -a  1-^a 

This  is  a variation  of  the  well-known  tridiagonal  Jacobi  matrix.  For  k3=  k4,  Q is  a 
symmetric  matrix,  and  for  a suitably  chosen  a value  it  would  be  admissible  as  a 
positive  definite  covariance  matrix.  For  kj=  k4=  0,  and 


a = p/(l  + p2),  p2  = (1  - p2)/(i  + p2)  (4.37) 

it  can  be  shown  that 


«p,p,0,0)  = (‘t-SS) 

Hence,  the  eigenvectors  of  Q(p  ,p  ,0,0)  are  the  same  as  those  of  the  KLT  in  the  first 
order  Markov  case.  Now  consider  all  those  Q matrices  in  eq.  4.36  that  are  admissible 
as  covariance  matrices,  i.e.,  for  those  values  of  kj,  k2,  k3=  k4,  and  a for  which  Q is 
positive  definite.  The  sinusoidal  family  of  unitary  transforms  is  the  class  of  complete 
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orthonormal  sets  of  eigenvectors  generated  by  these  Q matrices.  Note  that  associated 
with  each  covariance  matrix  there  is  a complete  orthonormal  set  of  eigenvectors. 
Table  4.1  summarizes  some  of  the  sinusoidal  transforms  obtained  by  solving  for 
orthonormal  vector  solutions  of 

O^r^N-1  (4.39) 

for  different  sets  of  kj,  1<  i <4.  In  this  table  §,  represents  the  r-th  column  of  the 
sinusoidal  transform  matrix  ♦.  All  of  the  eleven  transforms  listed  in  Table  4.1  are 
different  in  the  sense  that  they  are  eigenvectors  of  different  Q matrices.  This  table 
includes  well-known  transforms  such  as  the  KLT  (number  11),  the  DFT  (number  7), 
the  DST  (number  1),  and  the  DCT  (number  8).  From  eqs.  4.36  and  4.39,  and  for 
k3  = k4  (Jain  [1979]),  all  the  sinusoidal  transforms  are  solutions  to  the  homogeneous 
second-order  difference  equation 

O/k)-a[0>^(ife-l)+«>/k+l)]=X,O/)t),  Uk^N-2  (4.40) 

subject  to  the  parametric  family  of  boundary  conditions 

(l-k,a)<I»,(0)-a<I>,(l)+k3a<I»,(N-l)=A,^»/0) 
k3a3>,(0)-aa>,(N-2)+(l-ife2a)O/N-l)=A^^>/N-l)  ^ ‘ 

Thu?,  all  the  sinusoidal  transforms  satisfy  the  same  difference  equation  (i.e.,  eq.  4.40) 
and  differ  only  in  the  boundary  conditions  (eq.  4.41).  The  fact  that  sinusoidal 
transforms  are  solutions  to  eq.  4.40  and  eq.  4.41  can  be  checked  by  direct 
substitution.  The  results  shown  in  Table  4.1  can  be  obtained  by  assuming  that  the 
general  solution  to  eq.  4.40  has  the  form 

(4.42) 

where  A,  B,  and  0 are  complex  in  general,  and  depend  on  the  value  of  r.  A,  B,  and 
0 are  determined  by  applying  boundary  conditions  of  eq.  4.41. 

Some  DCT  and  DST  transforms  other  than  those  shown  in  Table  4.1  have 
been  proposed  recently.  Wang  [1984]  defined  four  types  of  DCT  and  four  types  of 
DST.  These  DCT  and  DST  transforms  are  shown  in  Table  4.2.  This  classification  of 
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Table  4.1  Some  members  of  the  sinusoidal  transforms  family. 


No. 

Q matrix 
parameters 

eigenvalues 

eigenvectors 

1 

k,=0 
k,  = 0 
k3  = 0 
k4  = 0 

1 

N+l 

2 . (r+l)(A:+l)7i 

. sm- — — — 

^ iV+l  N+l 

2 

kj  = -l 

k,  = -l 
k>0 
k4  = 0 

N 

2 . (2fc+l)(r+l)7i  , 

-sm- — — ^ , r iV-1 

2N 

— ,r  = N-1 

3 

ki  = 0 

k,  = 1 
k3  = 0 
k4  = 0 

21V+1 

2 „jj^(2r+l)(A:+l)7i 

y/2N+l  21V+1 

4 

kj  = 0 

k,  = -l 

k;=o 

k4  = 0 

l-2acos^^'"'^^^ 

2iV+l 

2 ^jj^2(r+l)(*+l)ji 

^2N+l 

5 

ki  = -l 

k,  = 0 
k‘  = 0 
k4  = 0 

21V+1 

2 „jj^(2^+l)(r+l)jt 

V'21V+1  27V+1 

6 

k,  = -l 
k,  = 1 
k^  = 0 
k4  = 0 

2N 

2„j^(2fc+l)(2r+l)7r 

4N 
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Table  4.1  Continued. 


No. 

Q matrix 
parameters 

eigenvalues 

eigenvectors 

7 

kj  = 0 

k,  = 0 
k3  = -l 
k4  = -l 

l-2acos^^ 

N 

1 ^±J2nrklN 

yfN 

8 

ki  = l 

k,=  1 

kj  = 0 

k4  = 0 

l-2acos— 

N 

2N 

-jz  =0 

v/N 

9 

ki  = l 

k2  = 0 
k3  = 0 
k4  = 0 

l-2acos^^'’^^^^ 

2JV+1 

v/2N+l  2(2iV+l) 

10 

ki  = l 

k2  = -l 
k3  = 0 
k4  = 0 

l-2acos^^'"'^^’' 

2N 

2 (2ife+l)(2r+l)7t 

\ COS 

N 4iV 

11 

ki=p 

k,=p 

k;=o 

k4  = 0 

eq.  4.31 

KLT  (Markov-I  Source) 
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Table  4.2  Four  types  of  discrete  sine  transforms  defined  by  Wang  [1984]. 


Type 

Basis  Functions  of  the  Transforms 

DST-I 

2 . (r+l)(fc+l)7i 

sm- — — — 

N+l  N+l 

DST-II 

C=1  for  r * N-1 
N 2N  forr  N 1 

fi 

DST-III 

— C.-l  for  k * N-l 

N 2N  ’ fork  N \ 

f2 

DST-IV 

2„in(2A:+l)(2r+l)7r 

N AN 
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Table  4.2  Continued. 


Type 

Basis  Functions  of  the  Transforms 

DCT-I 

C^=l  for  k*0,  k*N 
2 „ „ rkn  1 

^C,qcos  ^ for  k=0,  N 

y/2 

DCT-II 

A 

C=l  for  r * Q 
2^  (2^+l)/-7t  1 

\ 2N  ' - 

t/2 

DCT-III 

Cu-\  for  k * Q 
N'^  2N  ’ for  kQ 

v/2 

DCT-IV 

2 (2it+l)(2r+l)7t 

A — COS 

\N  4N 
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DCT  and  DST  was  also  used  in  Rao  and  Yip  [1990].  DCT-I  was  introduced  by  Wang 
and  Hunt  [1983].  DCT-II  and  its  inverse  DCT-III  were  introduced  by  Ahmed  et  al. 
[1974].  The  DCT-IV  was  introduced  by  Jain  [1979].  DST-I  and  DST-IV  were 
introduced  by  Jain  [1976,1979].  DST-II  and  its  inverse  DST-III  were  introduced  by 
Kekre  and  Solanki  [1978]. 

Note  that  not  all  the  basis  functions  of  sinusoidal  transforms  have  the 
symmetric  property.  In  other  words,  symmetric  property  is  not  a necessary  condition 
for  the  basis  functions  to  form  a valid  sinusoidal  transform.  However,  basis  functions 
of  the  most  frequently  used  sinusoidal  transforms,  i.e.,  DFT  (Table  4.1,  No.  7),  DST 
(Table  4.2,  DST-I),  and  DCT  (Table  4.2,  DCT-II),  do  have  the  symmetric  property. 

4.4.3  Discrete  Polynomial  Transforms 

The  class  of  discrete  polynomial  transforms  are  descendants  of  their  analog 
counterparts.  They  are  uniquely  identified  by  the  interval  of  support,  weighting 
function,  and  normalization.  Two  transforms  are  described  here.  They  are  Binomial- 
Hermite  family  and  the  Legendre  polynomials. 

The  derivations  of  the  discrete  polynomial  transforms  from  their  analog 
counterparts  usually  are  quite  involved.  In  addition,  they  are  available  in  numerous 
papers  and  books.  In  this  section,  only  results  of  the  derivations  are  presented.  The 
main  interest  here  is  to  see  if  the  basis  functions  of  these  two  discrete  polynomial 
transforms  are  symmetric. 

4.4.3. 1 The  Binomial-Hermite  Transform 

The  family  of  discrete  weighted  orthogonal  functions  was  first  developed  in 
a paper  by  Haddad  [1971],  and  was  subsequently  orthonormalized  in  Haddad  and 
Akansu  [1988]. 

The  members  of  the  discrete  Binomial-Hermite  family  are  generated  by 
successive  differences  of  the  Binomial  sequence 
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'N-r 


\ 


, r=0,l,,„^ 


(4.43) 


/ 


where  Vf(n)  = f(n)-f(n-l),  Taking  successive  differences  gives 


\m) 


(N 


\ 


lit 


(4.44) 


/ 


(4.45) 


where  k^‘"\  the  forward  factorial  function,  is  a polynomial  in  k of  degree  m 

k(k-l)  - (k-m+1),  m^l 
^ \ 1,  m=0 

The  polynomials  in  eq.  4,44  are  the  discrete  Hermite  polynomials. 

The  Hermite  polynomials  and  the  Binomial-Hermite  sequences  are  orthogonal 
over  the  interval  [0,N]  with  respect  to  weighting  sequences 


\k, 


and 


fN\-^ 

yk, 


respectively, 
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\-i 


/ 


■E  K»mk\ 

*=o  \ k 


=T 


N 


\-i 


r-s 


These  weighted  orthogonality  properties  suggest  that  by  proper  normalization  the 
Hermite  transform  can  provide  a unitary  matrix  suitable  for  signal  coding.  This 
modified  Hermite  transform  is  defined  as 


^(r,k) 


V ^ 


/ V 


ik) 


(4.47) 


We  want  to  show  that0(r,N-k)  = (-l)V(r,k),  or  equivalently  H,(N-k)  = (-l)''HXk)  using 
eq.  4,47  and  the  fact  that 


\k]\N-k, 

From  eq.  4.44,  this  is  equivalent  to  showing  that  Br(N-k)  = (-l)‘'B,.(k).  Instead  of  using 
the  expression  of  B,.(k)  in  eq,  4.44  to  do  the  proof,  a much  simpler  way  is  to  deal 
with  this  problem  in  the  frequency  domain. 

First,  one  notes  that  the  expansion  of  (l  + z'^)*^  is  the  following  binomial  series 
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where 
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(4.48) 
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k=0 


-k 


Also,  it  can  be  shown  that 


From  these  properties,  we  can  verify  that 

B,(z>Z 

and  that  B,(-z)  = which  implies  that  B/N-k)  = (-l)''Bj.(k). 


=(i-z-^yz 

[1^1 

k jj 

(4.49) 


(4.50) 


4.4.3.2  The  Discrete  Legendre  Polynomials 

The  discrete  Hermite  polynomials  weighted  by  the  Binomial  sequence  are 
suitable  for  representing  signals  with  Gaussian-like  features  on  a finite  interval.  The 
discrete  Legendre  polynomials  are,  on  the  other  hand,  uniformly  weighted  on  a finite 
interval.  The  derivation  of  this  family  is  briefly  discussed  here. 

Let  Lj(k)  be  a polynomial  of  degree  r on  [0,N-1], 

L^(k)  = l+a^jk+a^Jc^^^+-+cc^^^'‘\  r=0,l,-,N-l  (4.51) 

a„  are  chosen  to  satisfy  orthogonality 

£ (4.52) 

k=0 

Morrison  [1969]  showed  that  the  result  is 


r 


and  the  associated  norms  are 


r+s^ 
s j 


, O^s^r 


(4.53) 
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2 


(4.54) 


The  orthonormalized  discrete  Legendre  transform  (DLT)  is  therefore 


(4.55) 


It  was  pointed  out  by  Akansu  and  Haddad  [1992]  that  the  even  and  the  odd  indexed 
rows  of  the  DLT  matrix  are  respectively  symmetric  and  skew  symmetric  about  N/2 
if  N is  even. 

4.4.4  Rectangular  Transforms 

The  term  rectangular  transform  is  used  to  denote  orthonormal  basis  sequences 
obtained  by  sampling  analog  functions  that  are  switched  pulses  in  time.  In  the  Walsh 
family,  the  pulse  amplitudes  are  ± 1 for  every  member  of  the  set.  For  the  Haar 
functions,  the  values  of  the  switched  amplitudes  can  vary  from  row  to  row. 

4.4.4. 1 The  Walsh-Hadamard  Transform 

The  Hadamard  transform  matrices  of  order  N=2P  are  defined  recursively: 


where  ® indicates  the  Kronecker  product.  The  basis  vectors  of  the  Hadamard 
transform  can  also  be  generated  by  sampling  a class  of  functions  called  the  Walsh 


orthonormal  basis  for  square  integrable  functions.  For  this  reason  the  Hadamard 
transform  just  defined  is  also  called  the  Walsh-Hadamard  transform. 

It  is  easy  to  see  that  the  basis  functions  of  the  Hadamard  transform  are  highly 
symmetric. 


(4.57) 


(4.56) 


functions.  These  functions  also  take  only  the  binary  values  ± 1 and  form  a complete 
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4AA.2  The  Haar  transform 

The  Haar  functions  belong  to  an  orthogonal  family  of  switched  rectangular 
waveforms  where  the  amplitudes  can  differ  from  one  function  to  another.  They  are 
defined  on  the  interval  [0,1)  by 


1 


2^  2^ 

-2"^, 

2"*  2" 
0,  otherwise  in  [0,1) 


(4.58) 


The  index  r = 0,l,...,  N-1,  and  N=2^.  Also,  m and  k represent  the  integer 
decomposition  of  the  index  r = 2"’  + k-l.  These  are  rectangular  functions  that  can  be 
zero  in  subintervals  of  [0,1).  The  Haar  transform  is  obtained  by  letting  t take  discrete 
values  at  q/N,  q = 0,l,...,  N-1.  For  N = 8,  the  Haar  transform  is  symbolized  by  Hrg  and 
is  shown  in  eq.  4.59. 


8 


1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

1 

-1 

-1 

-1 

-1 

-v^ 

0 

0 

0 

0 

_ 1 

0 

0 

0 

0 

v/2 

(4.59) 
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we 

can 

see 

that 

some  of 

the 

basis 

functions  of  the  Haar 

transform  do  not  have  the  degree  of  symmetry  of  N /2.  However,  they  all  have  other 
degree  of  symmetry.  For  example,  the  fifth  row  vector  of  eq.  4.59  has  N/4-degree  of 
symmetry,  but  it  does  not  have  N/2  degree  of  symmetry,  where  N = 8. 
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4.4.5  Significance  of  the  Symmetrical  Basis  Functions 

In  Sections  4.4. 1 to  4.4.4,  the  symmetries  of  basis  functions  of  the  commonly 
used  transforms  are  investigated.  It  is  found  that  most  of  them  do  have  symmetries. 
Based  on  the  analysis  of  Section  4.3,  potential  fast  transforms  can  be  developed  for 
the  transforms  discussed  in  Section  4.4.  However,  since  the  degree  of  symmetry  varies 
from  one  transform  to  another,  the  actual  developments  of  the  fast  transforms  may 
not  follow  the  same  derivations.  In  other  words,  this  chapter  only  provides  a unified 
framework  from  which  the  fast  transforms  can  be  developed.  Different  treatments 
may  be  required  to  develop  different  fast  transforms.  In  the  next  chapter,  DCT  and 
DFT  are  chosen  to  demonstrate  how  fast  progressive  inverse  transforms  can  be 
developed  by  using  the  symmetrical  property  of  DCT  and  DFT,  respectively. 


CHAPTER  V 

FAST  RECONSTRUCTION  OF  TRANSFORMED  IMAGES 

5.1  Introduction 

The  general  principle  and  theory  discussed  in  Chapters  III  and  IV  are  used 
in  this  chapter  to  develop  an  algorithm  for  fast  progressive  reconstructions  of  discrete 
cosine  transformed  (DCT)  and  discrete  Fourier  transformed  (DFT)  images.  The 
developed  algorithm  is  intended  for  use  in  an  image  telebrowsing  system  operating 
under  a low  bit-rate  channel. 

In  this  chapter,  a fast  inverse  PIT  process  based  on  a two-dimensional  discrete 
cosine  transform  (2-D  DCT)  will  be  investigated.  There  are  several  reasons  for 
choosing  the  DCT-based  scheme.  First,  it  is  the  current  standard  for  the  PIT.  The 
result  of  the  study  may  have  immediate  application  in  the  industry.  Second,  the  DCT 
has  the  energy  packing  capabilities  that  approach  the  statistically  optimal  transform 
(i.e.  the  KLT)  in  decorrelating  a signal  governed  by  a Markov  process  (Rao  and  Yip 
[1990]).  Third,  various  fast  DCT  algorithms  have  been  developed.  Therefore,  we  can 
compare  the  performance  of  the  posed  approach  with  other  fast  inverse  DCT 
algorithms. 

5.2  Discrete  Cosine  Transformed  Images 

The  2-D  DCT  and  its  inverse  are  defined  as  follows.  Let  f be  an  M x N matrix 
representing  the  2-D  data  and  F be  the  2-D  DCT  (type  II)(Wang  [1984],  Rao  and 
Yip  [1990])  of  f.  Then, 


89 


90 


^00 


4) 


10 


<l) 

4) 


0^-1 


1^-1 


4>i»/-i,o  ■■■  4>w-i^-i 


faa 

f\Q 


Jq^-\ 

f\^-\ 


4^00  •••  4^7^-1,0 

4^01  4f//-i,i 


(5.1) 


)w-l,0  ■■■  ^-1^-1.  4^0/^-!  ■"  4^A^-1/^-1. 
where  § and  Y are  the  unitary  DCT  matrices  discussed  in  section  4.4.2  and  listed  in 

Table  4.2.  The  r-k  element  of  i is 
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Substituting  ^nd  41,^  in  eq.  5.1  with  those  in  eq.  5.2a  and  5.2b,  respectively,  gives 
the  u-v  element  of  F as 
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where  u = 0,  ...,  M-1,  v = 0, ...,  N-1.  Similarly,  the  x-y  element  of  f is  given  by  the  2-D 
inverse  DCT  (IDCT)  of  F,  defined  as 
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Substituting 0,i.  and  41^,.  in  eq.  5.4  with  those  in  eq.  5.2a  and  5.2b,  respectively,  gives 
the  x-y  element  of  f as 
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The  remaining  section  is  organized  as  follows.  First,  extensive  computational 
requirements  and  related  problems  for  IDCTs  in  the  inverse  PIT  are  addressed. 
Second,  traditional  fast  IDCT  algorithms  and  their  inherent  drawbacks  in  the  inverse 
PIT  process  are  discussed.  Third,  the  theoretical  background  of  proposed  approach 
is  introduced.  Also,  based  on  the  proposed  approach,  an  algorithm  for  the  IDCT  is 
proposed.  Finally,  the  algorithm  is  compared  to  the  other  fast  IDCT  algorithms  in 
terms  of  the  computational  complexity  and  other  issues. 

5.2.1  Heavy  Computation  of  IDCTs  in  Inverse  PIT  Process 

A typical  example  of  a DCT-based  PIT  scheme  is  the  JPEG’s  scheme  which 
was  introduced  in  Appendix  A.  There  are  other  non-JPEG  DCT-based  PIT  schemes 
available,  including  multistage  quantization  scheme  and  feedback  transform  coding 
scheme. 

In  a DCT-based  multistage  quantization  scheme,  the  DCT  coefficients  are 
compressed  by  different  quantizers  at  different  stages  (Wang  and  Goldberg  [1986], 
Wang  and  Goldberg  [1988],  Aghagolzadeh  and  Ersoy  [1991]).  Usually,  the 
reconstructed  image  from  quantization  steps  of  the  first  quantizer  is  the  coarsest  and 
becomes  finer  as  subsequent  quantizers  are  performed.  The  basic  principle  used  in 
this  scheme  is  to  quantize  the  DCT  coefficients  in  a coarser  quantizer.  Then,  the 
differences  between  the  DCT  coefficients  and  their  quantized  versions  are  further 
quantized  using  a finer  quantizer.  This  process  can  be  extended  to  more  than  two 
stages  as  shown  in  Fig.  2.3.  Again,  at  each  stage  of  image  reconstruction,  an  IDCT 
is  needed  for  each  of  the  image  blocks.  The  major  difference  between  multistage 
quantization  scheme  and  the  JPEG  scheme  for  the  PIT  is  that  the  former  produces 
different  sets  of  compressed  data  at  different  stages  while  the  latter  produces  a single 
set  of  compressed  data  but  chopped  into  different  pieces  at  different  stages.  In  fact, 
each  set  of  compressed  data  in  the  multistage  quantization  scheme  can  be  chopped 
into  different  pieces  as  the  JPEG  scheme  does. 
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In  a feedback  transform  coding  (FTC)  scheme  (Neagoe  [1992]),  the  number 
of  nonzero  DCT  coefficients  is  adaptively  increased  until  the  mean  squared  error 
between  the  original  image  subblock  and  the  reconstructed  one  becomes  less  than 
or  equal  to  a given  threshold.  In  other  words,  the  FTC  retains  only  a "just  enough" 
number  of  nonzero  DCT  coefficients  such  that  the  compression  ratio  is  maximized 
while  the  reconstruction  error  is  kept  under  a given  threshold  value.  The  coding 
scheme  is  conceptually  similar  to  the  spectral  selection  scheme  in  the  JPEG’s  PIT 
operation.  It  is  useful  in  an  off-line  process  where  the  sender  can  predetermine  which 
DCT  coefficients  should  be  retained  or  discarded.  The  same  idea  can  be  used  in  an 
image  telebrowsing  system  where  the  DCT  coefficients  are  calculated  and  some  of 
which  are  retained  according  to  the  FTC  scheme. 

Upon  a user’s  request,  the  DCT  coefficients  can  be  sent  without  delay.  So,  the 
processing  time  of  the  sender  is  not  considered  as  part  of  the  response  time  which 
was  defined  in  Chapter  I.  Even  though  the  FTC  scheme  is  basically  for  an  off-line 
process  where  longer  computation  time  can  be  tolerated  than  in  the  on-line  process, 
less  computation  time  is  always  welcome.  Since  the  mean  squared  error  for  the 
spatial  and  transform  domain  are  identical  insofar  as  the  unitary  transforms  are 
concerned,  it  is  not  necessary  to  return  to  the  original  picture  domain  for  every  FTC 
iteration  to  evaluate  the  mean  squared  error.  That  is,  no  IDCT  calculation  is 
required  for  every  FTC  iteration.  However,  it  is  well  known  that  the  mean  squared 
error  is  an  objective  fidelity  criterion  which  does  not  necessarily  reflect  the  visual 
quality  of  the  picture.  Thus,  the  FTC  should  be  modified  to  allow  human 
interventions  such  that  both  image  quality  and  mean  squared  error  can  be  monitored. 
In  this  case,  intensive  computation  of  the  IDCT  is  unavoidable  in  the  FTC. 

If  the  DCT-based  PIT  is  used,  the  major  task  in  the  inverse  PIT  process  is  to 
perform  at  least  one  IDCT  for  each  subblock  whose  size  is  usually  small.  Given  r 
stages  of  reconstruction  of  a greyscale  image,  the  total  number  of  M x N IDCTs 
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required  is  r x (P  x Q)  / (M  x N),  where  P and  Q are  the  dimensions  of  the  image 
and  M and  N are  the  dimensions  of  an  image  block.  If  color  images  are  considered, 
the  number  will  be  roughly  triple.  The  lower  bound  for  r is  1 but  its  upper  bound 
depends  on  the  image,  the  viewer,  and  the  PIT  scheme.  In  the  JPEG  standard,  a 
block  size  of  8 x 8 is  chosen.  For  a 512  x 512  grayscale  image  with  8x8  block  size, 
the  number  of  IDCTs  required  is  4096r.  For  a 1024  x 1024  image,  it  would  require 
16384r  IDCTs.  Due  to  the  fast  development  in  image  capturing  and  display 
technique,  images  larger  than  1024  x 1024  in  size  are  not  unusual  nowadays.  In 
addition,  because  there  is  a strong  demand  on  high  resolution  display  in  the  near 
future  the  image  size  is  expected  to  increase. 

To  get  some  idea  on  how  intensive  the  computational  requirements  are,  let 
us  consider  the  following  scenario.  Assuming  all  the  images  to  be  browsed  are 
divided  into  8x8  blocks,  and  the  IDCTs  are  computed  using  eq.  5.5.  Each  block  will 
take  about  8^*  (=  4096)  multiplications  because  the  computational  complexity  is 
©(N'*),  where  N is  the  size  of  input  square  matrk.  Now,  assuming  10  color  images 
with  the  size  of  1024  x 1024  are  sent  using  a 4-stage  PIT.  Also,  assume  each  color 
image  has  three  color  components,  i.e.,  red,  green,  and  blue.  The  total  number  of 
multiplications  required  for  this  case  is  about  10  x 3 x [(1024  x 1024)  / (8  x 8)]  x 4 

X 4096  « 8 X 10^  This  suggests  that  computations  for  the  IDCT  in  the  inverse  PIT  are 
by  no  means  trivial. 

5.2.2  Traditional  Approaches  to  IDCT  Computing 

The  time  needed  for  computing  the  IDCTs  in  a telebrowsing  system  is 
proportional  to  the  number  of  images  under  browsing,  the  size  of  the  images,  the 
number  of  browsing  stages,  and  the  way  that  the  IDCTs  are  performed.  The  first 
three  factors  are  at  the  choice  of  the  user  and  are  not  "negotiable".  Thus  the 
performance  of  the  IDCT  will  be  the  key  factor  to  reducing  the  computation  time. 
As  shown  earlier,  the  direct  computation  of  DCT  or  IDCT  is  of  ©(N"*).  Researchers 


94 


have  been  trying  to  reduce  the  order  of  complexity  by  developing  fast  algorithms  for 
many  years. 

Traditional  fast  algorithms  for  the  2-D  DCT/IDCT  fall  into  two  major 
categories  - direct  approaches  and  indirect  approaches.  In  indirect  approaches,  the 
2-D  DCT  is  reduced  to  a series  of  1-D  DCTs  by  using  the  separability  property  of 
the  DCT  (Rao  and  Yip  [1990]).  The  separability  property  of  the  2-D  IDCT  can  be 
illustrated  as  follows. 

Rewrite  eq.  5.5  as 
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The  inner  summation  is  an  N-point  1-D  IDCT  of  the  rows  of  F,  whereas  the  outer 
summation  represents  the  M-point  1-D  IDCT  of  the  columns  of  the 
"semitransformed"  matrbc.  This  implies  that  a 2-D  (M  x N)  IDCT  can  be 
implemented  by  M N-point  1-D  IDCTs  along  the  rows  of  F,  followed  by  N M-point 
1-D  IDCTs  along  the  columns  of  the  matrix  obtained  after  the  row  transformation. 
There  are  many  fast  algorithms  available  for  1-D  DCT  (Chen  et  al.  [1977],  Makhoul 
[1980],  Wang  [1984],  Lee  [1984],  Yang  and  Narasimha  [1985],  Yip  and  Rao  [1985], 
Malvar  [1986],  Duhamel  and  H’Mida  [1987],  Hou  [1987],  Malvar  [1987],  Loeffler  et 
al.  [1989],  Li  [1991],  Chin  and  Wu  [1991]). 

In  direct  approaches,  the  2-D  DCT  for  a M x N block  is  performed  in  a way 
similar  to  MN  points  1-D  DCT.  Some  fast  2-D  DCT  algorithms  using  direct  approach 
are  also  available  (Makhoul  [1980],  Kamangar  [1982],  Vetterli  [1985],  Haque  [1985], 
Duhamel  and  Guillemot  [1990],  Chan  and  Ho  [1991],  Cho  and  Lee  [1991],  Ta  et  al. 
[1991]).  In  general,  direct  approaches  are  more  efficient  than  indirect  approaches  in 
terms  of  number  of  multiplications  and  additions.  However,  the  implementational 
complexity  for  the  preparation  of  DCT  computing  (e.g.  data  rearrangement)  is  also 
higher.  Using  direct  approaches,  the  best  computational  complexity  is  about 
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©(N'logjN)  for  a square  input  matrix  with  size  N.  The  actual  number  of 
multiplications  for  an  8 x 8 DCT  or  IDCT  ranges  from  96  to  320  (Makhoul  [1980], 
Cho  and  Lee  [1991]).  Besides  the  number  of  multiplications,  the  number  of  additions 
is  usually  considered  as  well  in  performance  evaluation  of  the  algorithms.  The  order 
of  complexity  for  number  of  additions  is  about  the  same  as  that  of  multiplications 
when  using  the  direct  approaches.  The  actual  number  of  additions  is  between  430 
and  580  (Makhoul  [1980],  Cho  and  Lee  [1991]). 

5.2.3  The  Problems  of  Traditional  Approaches  in  PIT 

In  section  4.2.2,  it  was  mentioned  that  computational  redundancy  and  delay 
time  are  two  inherent  drawbacks  of  PIT.  Further  discussion  of  these  drawbacks  for 
the  IDCT  computing  in  the  inverse  PIT  is  given  in  this  section. 

Because  image  data  typically  vary  very  little  from  one  pixel  to  the  next  pbcel 
across  an  image,  the  forward  DCT  (FDCT)  processing  step  lays  the  foundation  for 
achieving  data  compression  by  concentrating  most  of  the  signal  in  lower  spatial 
frequencies.  For  typical  source  images,  most  of  the  spatial  frequencies  of  an  8 x 8 
image  block  have  zero  or  near-zero  amplitude  and  need  not  be  encoded. 

One  good  way  of  visualizing  the  redundancy  in  traditional  algorithms  for  the 
IDCT  is  to  use  the  signal  flowgraph  as  discussed  in  Chapter  IV.  A signal  flowgraph 
represents  the  input-output  relationship  of  a process,  or  more  generally,  a system.  For 
an  IDCT  signal  flowgraph,  it  reflects  the  exact  computational  complexity  required  to 
implement  an  IDCT.  Based  on  different  algorithms  for  the  IDCT,  different  signal 
flowgraphs  can  be  constructed.  For  example,  a 1-D  4-point  IDCT  proposed  by  Chen 
et  al.  [1977]  is  shown  in  Fig.  5.1.  For  an  4 x 4 IDCT  using  direct  approaches,  the 
signal  flowgraph  will  have  16  input  nodes  and  16  output  nodes.  Since  a lot  of  inputs 
presented  at  16  input  nodes  are  zeros,  the  paths  between  a zero  input  node  and 
output  nodes  are  trivial  or  redundant  in  the  sense  that  the  input  from  that  node  does 
not  contribute  anything  to  the  output  except  a zero. 
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Fig.  5.1  A 4-point  IDCT  (Chen  et  al.  [1977]). 
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The  redundancy  comes  from  the  fact  that  traditional  fast  IDCT  algorithms 
deal  with  the  entire  coefficients  in  a block  collectively  rather  than  each  coefficient 
individually.  In  other  words,  the  characteristic  of  an  input  image  block  to  IDCT  is 
generally  not  exploited  in  traditional  algorithms.  Those  algorithms,  therefore,  are  not 
efficient  in  dealing  with  highly  sparse  matrices,  and  are  not  efficient  in  the  DCT- 
based  inverse  PIT. 

In  addition  to  the  redundancy  problem  discussed  above,  the  delay  time  also 
exists  in  traditional  IDCT.  In  general,  an  IDCT  of  a block  of  coefficients  cannot  be 
computed  until  all  the  coefficients  in  the  block  are  available. 

5.2.4  The  Proposed  Approach 

In  the  inverse  PIT  process,  computation  burden  of  the  IDCT  and  computation 
redundancy  associated  with  traditional  algorithms  are  two  major  problems.  The 
improper  coefficient  scanning  order  and  the  existence  of  the  delay  time  in  Takikawa’s 
and  Miran  and  Rao’s  approach  present  two  more  problems  in  the  inverse  PIT 
process.  In  view  of  all  these  problems,  the  proposed  approach  is  intended  to  meet 
the  following  goals.  First,  it  must  be  fast  and  efficient.  Second,  it  must  allow  a 
scanning  pattern  that  can  conform  to  the  visual  significance.  Finally,  it  must  have 
practically  no  delay  time. 

The  proposed  approach  will  be  discussed  in  the  following  manner.  First,  the 
theoretical  background  of  the  approach  will  be  presented.  Based  on  the  approach, 
an  algorithm  with  less  computational  redundancy  is  proposed.  Next,  the 
computational  complexity  of  the  proposed  algorithm  is  examined  and  compared  with 
that  of  traditional  algorithms.  Finally,  the  scanning  pattern  and  delay  time  problems 
in  the  inverse  PIT  process  are  discussed. 


98 


5.2.4. 1 Theoretical  Background 

If  the  IDCT  is  used  in  image  decompression,  its  input  block  contains  only  few 
nonzero  coefficients.  This  has  been  demonstrated  by  an  example  in  Appendix  A (Fig 
A.4),  and  it  is  generally  true  for  the  images  examined  by  this  study.  In  addition,  if  the 
PIT  scheme  is  used,  the  input  matrix  to  IDCT  may  contain  even  fewer  nonzero 
elements.  How  many  elements  in  the  matrix  must  be  zero  for  it  to  be  considered 
sparse  depends  on  the  applications.  Generally,  a matrix  is  called  sparse  if  there  is  a 
computational  advantage  to  exploit  its  zeros  (Duff  et  al.  [1986]).  It  is  well  known  that 
exploiting  the  sparsity  of  a matrix  can  lead  to  enormous  computational  savings  in 
many  applications,  such  as  solving  simultaneous  equations  with  Gauss-Jordan 
elimination  method.  Inspired  by  this  fact,  the  sparsity  of  the  input  matrix  will  be 
exploited  to  compute  the  IDCT  efficiently  in  the  PIT  environment. 

For  ease  of  exposition,  several  terms  are  defined  first.  A target  matrix  is  an 
image  subblock  consisting  of  the  quantized  DCT  coefficients  that  are  partially 
encoded  (for  example,  by  using  spectral  selection  and/or  successive  approximation 
schemes  as  recommended  by  JPEG).  Performing  an  IDCT  on  a target  matrix  results 
in  a matrix  called  goal  matrix.  The  result  of  processing  one  nonzero  element  in  the 
target  matrix  is  called  a partial  contribution  to  the  goal  matrix.  Throughout  the  rest 
of  the  dissertation,  the  partial  contribution  is  treated  as  a matrix  or  all  its  elements 
in  the  matrbc  depending  on  the  context. 

Based  on  the  definition  of  the  2-D  IDCT,  only  nonzero  elements  in  the  target 
matrix  can  contribute  to  the  goal  matrix.  In  fact,  the  value  of  each  nonzero  element 
can  affect  the  values  of  all  elements  in  the  goal  matrix.  The  idea  of  the  proposed 
approach  is  to  completely  ignore  the  zero  elements  in  the  target  matrix  and  process 
each  nonzero  element  separately  and  efficiently.  The  goal  matrix  is  then  updated  by 
combining  the  partial  contribution  due  to  the  processing  of  each  nonzero  element. 
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Therefore,  the  computation  of  the  IDCT  is  divided  into  two  tasks,  i.e.,  the 
computation  of  partial  contribution  and  the  combination  of  partial  contributions. 

The  remaining  part  of  this  subsection  is  presented  as  follows.  First,  the 
concept  of  mirror  effect  of  partial  contribution  is  introduced  as  the  theoretical  basis 
of  computing  partial  contributions.  Then,  direct  computation  of  the  partial 
contribution  by  its  definition  and  its  fast  version  are  demonstrated.  Finally,  how  to 
combine  the  partial  contributions  into  final  goal  matrix  and  the  issues  of 
computational  complexity  are  discussed. 

For  convenience,  the  IDCT  is  shown  again  as  follows: 
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(5.7) 


here  x = 0, ...,  M-l,  y = 0, ...,  N-l.  The  coefficient  in  front  of  the  double  summation  of 
eq.  5.7  is  a scale  factor  which  requires  essentially  no  computation  (except  a register 
shift  operation).  In  practical  applications,  M = N and  M=4,  8,  or  16  are  often  used. 
Thus,  it  is  usually  neglected  by  researchers  when  comparing  the  computational 
complexity  among  the  fast  algorithms  of  the  IDCT.  By  taking  the  scale  factor  out,  the 
normalized  or  scaled  form  of  eq.  5.7  is 
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If  [fx,y]uv  is  defined  as  the  partial  contribution  of  F„^  to  the  goal  matrk  [^ , then 
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We  can  show  the  following  symmetrical  relationships  of  the  partial  contribution: 


(5.10) 
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where  R = cd[M,u],  S = cd[N,v],  and  cd[a,b]  is  the  common  divisor  of  a and  b.  The 
proofs  of  eq.  5.10,  eq.  5.11,  and  eq.  5.12  are  given  in  Appendix  C.  Eq.  5.10  indicates 
the  row  symmetry  of  the  partial  contribution  with  respect  to  a horizontal  line 


x=-\  — -1 

2[R 


(5.13) 


Similarly,  eq.  5.1 1 shows  the  column  symmetry  of  the  partial  contribution  with  respect 
to  a vertical  line 
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Eq.  5.12  indicates  the  symmetric  relationship  of  the  partial  contribution  with  respect 
to  a point  (x,y),  where  x and  y are  shown  in  eq.  5.13  and  eq.  5.14,  respectively. 

Since  R = cd(M,u),  the  minimum  of  R = R„in=  1 and  the  maximum  of  R = 


^niax~  gcd(M,u)  or  R e [l,gcd(M,u)],  where  gcd(a,b)  is  the  greatest  common  divisor 
of  integers  a and  b.  Similarly,  since  S = cd(N,v),  the  minimum  of  S = S„:.=  1 and  the 


maximum  of  S = gcd(N,v)  or  S e [l,gcd(N,v)]. 

Since  the  values  of  R and  S can  be  multiple,  the  partial  contribution  exhibits 
multiple  degrees  of  symmetry  or  mirror  effect.  This  effect  can  be  visualized  more 
clearly  by  an  example.  Without  loss  of  generality,  let  M and  N be  even.  Consider  the 
case  when  R = R^a,^=gcd(M,u)  = l and  S = S„a,;=gcd(N,v)  = l.  In  this  case,  the  partial 
contribution  can  be  divided  into  four  M/2  x N/2  submatrices  as  shown  in  Fig.  5.2. 
Given  Part  I of  the  partial  contribution.  Part  II  of  the  partial  contribution  is  known 
because  of  the  row  symmetry  with  respect  to  the  horizontal  line 
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Fig.  5.2  The  partial  contribution  to  a goal  matrix  is  divided  into  4 submatrices. 
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•t=-i(W-l)  (5.15) 

Similarly,  given  Part  I and  Part  II  of  the  partial  contribution,  Part  III  and  Part  IV  of 
the  partial  contribution  can  be  determined  by  using  the  column  symmetry  with 
respect  to  the  vertical  line 

yi(N-l)  (5.16) 

Therefore,  given  only  Part  I of  the  partial  contribution,  the  entire  partial  contribution 
can  be  determined  by  using  the  row  symmetry  of  eq.  5.10  and  the  column  symmetry 
of  eq.  5.11.  Note  that  eq.  5.12  is  not  used  here.  In  fact,  eq.  5.12  is  redundant  in 
determining  the  entire  partial  contribution  because  it  can  be  derived  directly  from 
eq.  5.10  and  eq.  5.11  (refer  to  Appendix  C).  Thus,  for  practical  consideration,  eq. 
5.10  and  eq.  5.11  are  enough  to  reconstruct  the  entire  partial  contribution  and  eq. 
5.12  can  be  discarded.  Also  note  that  only  possible  sign  changes  are  involved  in  eq. 
5.10  and  eq.  5.11  and  practically  no  addition  or  multiplication  is  required.  The 
significance  of  this  result  is  that  only  a quarter  of  the  partial  contribution  needs  to 
be  determined  through  multiplications.  The  rest  of  them  can  be  determined  by 
simple  copy  operations  and  possible  sign  changes. 

Further  computation  savings  are  possible  if  gcd(M,u)>l  or  gcd(N,v)>l.  For 
example,  consider  the  case  when  gcd(M,u)=2  and  gcd(N,v)=2.  In  this  case.  Part  I of 
the  partial  contribution  in  Fig.  5.2  can  be  further  divided  equally  into  four  parts  of 
the  partial  contribution,  say  parts  V,  VI,  VII,  and  VIII.  Again,  given  Part  V of  the 
partial  contribution.  Parts  VI,  VII,  and  VIII  can  be  determined  by  using  the 
symmetrical  relationships  of  eq.  5.10  and  eq.  5.11.  In  this  case,  the  symmetrical  lines 
are  denoted  by  eq.  5.13  and  eq.  5.14  with  R = S=2.  Once  Parts  V,  VI,  VII,  and  VIII 
of  the  partial  contribution  are  determined.  Part  I of  the  partial  contribution  is  known. 
Since  Rn,a^=gcd(m,u)  = 2 and  Sn,a^=gcd(n,v)=2  guarantee  R = cd(M,u)  = l and 
S = cd(N,v)  = 1,  eq.  5.10  and  eq.  5.11  can  be  used  again  to  determine  Parts  II,  III,  and 
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IV  of  the  partial  contribution.  The  horizontal  and  vertical  symmetrical  lines  are 
represented  by  eq.  5.13  and  eq.  5.14  with  R = S = 1 or  eq.  5.15,  and  eq.  5.16, 
respectively.  Therefore,  only  Part  V of  the  partial  contribution  needs  to  be 
determined  through  regular  multiplications  and  the  remaining  parts  of  the  partial 
contribution  can  be  determined  by  using  eq.  5.10  and  eq.  5.11  twice.  In  this  case,  only 
one-sbcteenth  of  the  partial  contribution  needs  to  be  determined  through  regular 
multiplications.  In  general,  the  required  number  of  multiplications  to  determine  the 
entire  partial  contribution  is  proportional  to 


Since  and  are  the  functions  of  u and  v,  respectively,  the  computational 
complexity  of  the  partial  contribution  depends  on  the  u-v  index  of  a coefficient 

5.2.4.2  The  Expansion  of  Partial  Contribution 

Using  the  mirror  effect  can  lead  to  great  savings  in  computation  of  partial 
contribution.  Consider  the  following  example:  compute  the  8 x 8 partial  contribution 
due  to  F42.  Since  R^^^=gcd(8,4)=4  and  S^^, = gcd(8,2) = 2,  M/(2*4)  = l and 
N/(2*2)  = 2.  In  other  words,  only  1 x 2 of  the  8x8  partial  contribution  needs  to  be 
calculated  explicitly,  which  is 


where  fgo  = F42COs(7t  /4)sin(7r  /8)  and  foi  = F42Cos(7r/4)cos(7r/8).  The  expansion  can  be 
done  by  row  first  and  then  by  column  or  by  column  first  and  then  by  row.  To  expand 
the  partial  contribution  by  column,  eq.  5.11  and  v/S  = 2/2  = 1 are  used.  The  result 
is 


where  fo2  = -foi  and  fo3  = -foQ.  Again,  using  eq.  5.11  and  v/S  = 2/1=2  will  result  in  a 
1x8  partial  contribution  as  follows 


(5.17) 
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where  fo4  = fo3»  fo7=foo-  Next,  perform  the  expansion  by  row.  By 

using  eq.  5.10  and  u/R  = 4/4  = 1,  the  1x8  partial  contribution  can  be  expanded 
to  2 X 8.  Similarly,  by  using  eq.  5.10  and  u/R  = 4/2  = 2,  the  2 x 8 partial 
contribution  can  be  expanded  to  4 x 8.  Finally,  using  eq.  5.10  and  u/R  = 4/1  = 4, 
the  4x8  partial  contribution  can  be  expanded  to  its  full  size  which  is  8 x 8. 

5.2.4.3  Computation  of  Partial  Contribution 

The  partial  contribution  can  be  obtained  by  the  use  of  eq.  5.9  directly.  Assume 
the  values  of  cosine  functions  for  different  combinations  of  x and  u (or  y and  v)  are 
precalculated  and  stored  as  a table.  Let  Q be  the  number  of  multiplications  required 
to  find  the  partial  contribution  due  to  Then,  Q=2MN  if  both  u and  v are  not 
zero,  Q = 3MN  if  u = 0 or  v=0  but  not  both,  and  Q = 0 if  u and  v are  both  zero.  For 
an  M X N target  matrix  with  n nonzero  F„^,  where  1 < n < < MN,  the  number  of 
multiplications  needed  to  get  the  goal  matrbc  is  from  2(n-l)MN  to  3nMN.  With  this 
naive  approach,  no  addition  but  128  to  384  multiplications  are  required  if  M=N  = 8 
and  n = 2.  Comparing  to  the  best  8x8  fast  2-D  IDCT  algorithms  reported  so  far 
which  require  96  multiplications  (Duhamel  and  Guillemot  [1990],  Cho  and  Lee 
[1991]),  this  naive  approach  is  not  comparable  to  them.  Note  that  in  this  naive 
approach,  the  symmetrical  property  of  partial  contribution  is  not  exploited.  From  a 
signal  flowgraph’s  point  of  view,  the  naive  approach  uses  a signal  flowgraph  in  Fig. 
4.1(a).  To  exploit  the  symmetrical  property  of  partial  contribution,  we  should  use  the 
signal  flowgraph  in  Fig.  4.3.  To  construct  such  a signal  flowgraph  for  the  IDCT,  we 
can  follow  the  principle  discussed  in  section  4.3.3.  However,  we  want  to  introduce  an 
alternative  here.  It  can  produce  the  desired  signal  flowgraph  and  demonstrate 
simultaneously  the  computational  redundancy  associated  with  the  traditional  fast 
IDCT  algorithms. 

Eqs.  5.8  and  5.9  are  equivalent  if  only  one  term  (u,v)  in  the  double  summation 
of  eq.  5.8  is  nonzero.  So  the  traditional  fast  algorithms  for  eq.  5.8  can  be  applied  to 
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eq.  5.9  as  well.  However,  direct  use  of  them  to  compute  the  partial  contribution  is 
not  desirable  since  they  contain  high  computational  redundancy.  With  a systematic 
reduction  rule  for  the  signal  flowgraphs  of  traditional  algorithms  plus  the  use  of 
mirror  effect  of  partial  contribution,  a much  faster  way  of  computing  the  partial 
contribution  than  the  naive  approach  described  above  is  possible. 

The  basic  principle  is  to  reduce  the  signal  flowgraph  of  a traditional  algorithm 
by  retaining  only  the  nontrivial  paths.  This  principle  is  demonstrated  by  an  example 
as  follows.  Consider  the  indirect  (or  row-column)  approach  of  a fast  2-D  IDCT  for 
a 4 X 4 target  matrix.  Chen’s  algorithm  (Chen  et  al.  [1977])  is  chosen  here  for 
retaining  the  nontrivial  paths  because  it  is  simple  and  well  recognized.  Normally,  8 
4-point  TD  IDCTs  are  needed  to  accomplish  the  task  (with  very  complicated  data 
reordering,  4 4-point  IDCTs  are  enough  (Cho  and  Lee[1991j).  However,  in  our  case 
at  most  3 4-point  IDCTs  are  necessary  (1  along  the  rows  (or  columns)  of  the  target 
matrix  to  get  an  intermediate  matrix  and  2 along  the  columns  (or  rows)  of  the 
intermediate  matrix  to  get  a 2 x 2 submatrix  of  the  partial  contribution).  The  other 
three  2x2  submatrices  can  be  derived  automatically  by  the  use  of  mirror  effect. 
Furthermore,  each  4-point  IDCT  can  be  done  efficiently  since  only  one  input  data 
out  of  4 is  nonzero.  Consider  the  signal  flowgraph  for  a 4-point  IDCT  (Chen  et  al. 
[1977])  shown  in  Fig.  5.3(a).  The  outputs  of  the  4-point  IDCT  (denoted  by  fg,  fj,  {2, 
and  13)  can  be  treated  as  linear  combinations  of  the  4 inputs  (denoted  by  Fq,  Fj,  F2, 
and  F3).  Since  only  one  of  the  inputs  is  nonzero.  Figs.  5.3(a)  and  5.3(b)  are 
functionally  equivalent.  The  signal  flowgraph  in  Fig.  5.3(b)  can  be  further  simplified 
by  retaining  only  two  of  the  four  outputs  (fo  and  fj)  as  shown  in  Fig.  5.3(c)  because 
the  other  two  outputs  can  be  derived  by  the  use  of  mirror  effect.  Further  reduction 
of  some  of  the  subgraphs  in  Fig.  5.3(c)  is  possible.  Specifically,  the  subgraphs  with 
input  Fq  and  F2  are  reducible.  The  final  irreducible  subgraphs  are  shown  in  Fig. 
5.3(d).  Instead  of  using  the  signal  flowgraph  in  Fig.  5.3(a)  to  calculate  a 4-point 
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IDCT,  one  of  the  four  simpler  subgraphs  in  Fig.  5.3(d)  is  used.  The  saving  in 
computation  is  obvious.  The  saving  is  even  more  significant  for  higher  order  IDCT. 
Consider  an  8 point  IDCT.  The  signal  flowgraph  generated  by  using  Chen’s  algorithm 
(Chen  et  al.  [1977])  is  shown  in  Fig.  5.4(a).  It  can  be  decomposed  into  8 simplified 
subgraphs  as  shown  in  Fig.  5.4(b).  Some  of  the  subgraphs  in  Fig.  5.4(b)  can  be 
further  simplified  to  irreducible  subgraphs  as  shown  in  Fig.  5.4(c).  Note  that  there 
are  some  intermediate  nodes  in  the  irreducible  subgraphs  with  inputs  Fj,  F3,  Fj,  and 
F7.  Those  intermediate  nodes  can  be  removed  by  combining  the  gain  associated  with 
each  branch  until  one  final  gain  is  produced.  The  result  is  shown  in  Fig.  5.4(d).  Each 
output  can  always  be  represented  as  a product  of  a single  constant  and  the  input 
since  no  loop  or  feedback  is  in  the  signal  flowgraph.  Thus,  every  intermediate  node 
is  removable  no  matter  what  the  size  of  the  output  is  and  how  complicate  the 
subgraph  looks  like.  For  convenience,  the  subgraphs  shown  in  Figs.  5.4(c)  and  5.4(d) 
are  said  to  be  in  their  primitive  forms.  In  other  words,  they  can  not  be  reduced  or 
simplified  any  more. 

The  primitive  subgraph  with  input  F„  and  the  one  with  input  F^  can  be 
cascaded  as  a signal  flowgraph  to  compute  part  of  the  partial  contribution  due  to  F„^. 
The  connection  rule  is  that  at  each  output  of  the  first  subgraph,  the  second  subgraph 
is  cascaded.  Which  subgraph  should  be  the  first  is  immaterial  as  far  as  the  result  is 
concerned.  However,  the  computational  complexity  may  be  different.  This  issue  will 
be  addressed  in  the  next  section.  Once  part  of  the  partial  contribution  is  obtained, 
the  expansion  to  its  full  size  can  be  done  as  demonstrated  earlier  in  this  section. 

The  general  procedure  to  construct  primitive  subgraphs  is  summarized  as 
follows. 

Step  1.  Construct  the  signal  flowgraph  based  on  a traditional  fast  1-D  IDCT 
algorithm. 
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(b) 

Fig.  5.3  (a)  A 4-point  IDCT  (b)  signal  decomposition  of  part  a (c)  simplified 
version  of  part  b (d)  irreducible  subgraphs  of  part  a. 
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(c) 


Fig.  5.3  Continued. 
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(a) 


Fig.  5.4  (a)  An  8-point  IDCT  (b)  8 decomposed  signals  of  part  a (c) 
irreducible  subgraphs  of  part  a (d)  primitive  forms  of  part  c. 
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Fig.  5.4  Continued. 
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Fig.  5.4  Continued. 
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Fig.  5.4  Continued. 
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Step  2.  For  every  input  node,  construct  a subgraph  by  retaining  only  the  nontrivial 
paths  from  that  input  node  to  all  output  nodes. 

Step  3.  Each  subgraph  can  be  simplified  by  the  repeat  use  of  mirror  effect  of  partial 
contribution  until  it  is  irreducible. 

Step  4.  Remove  intermediate  nodes  in  the  irreducible  subgraph  if  any. 

5.2.5  An  Efficient  IDCT  Algorithm  for  Inverse  PIT 

In  this  section,  an  efficient  IDCT  algorithm  is  proposed  for  the  inverse  PIT 
process  based  on  the  approach  discussed  in  previous  sections. 

Let  n be  the  number  of  nonzero  elements  in  a given  target  matrix  with  size 
M X N.  For  convenience,  nonzero  elements  are  labelled  in  order  from  0 to  n-1.  Let 
[g]  be  the  goal  matrix  to  be  derived.  Let  [0]  be  a null  matrix.  The  proposed  algorithm 
is  summarized  as  follows. 

1.  Declare  a temporary  matrix  [t]  = [0]  with  size  M x N. 

2.  Let  i = 0. 

3.  Fetch  element  i from  the  target  matrix. 

4.  Compute  the  partial  contribution  [p]  of  element  i. 

5-1.  If  i is  even,  [t]  = [p]. 

5-2.  If  i = 1,  [t]  = [t]  + [p],  [g]  = [t]. 

5-3.  If  i is  odd  and  i > 1,  [t]  = [t]  + [p],  [g]  = [g]  + [t]. 

5-4.  If  i = n-1  and  n is  odd,  [g]  = [g]  + [t]. 

6.  If  i = n-1,  [g]  is  the  result,  stop;  otherwise,  i = i + 1,  go  to  Step  3. 

5.2.5. 1 Number  of  Multiplications 

Only  Step  4 of  the  algorithm  involves  multiplications.  Therefore,  the  number 
of  multiplications  of  the  algorithm  is  equal  to  that  of  computing  partial  contributions 
due  to  all  available  nonzero  elements. 
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The  complexity  of  computing  part  of  the  partial  contribution  due  to  can  be 
examined  by  checking  the  primitive  subgraphs  with  inputs  F„  and  F^.  The  two 
primitive  subgraphs  are  cascaded  in  the  way  described  earlier.  If  the  first  subgraph 
takes  P multiplications  and  the  other  requires  Q multiplications,  then  the  total 
number  of  multiplications  required  to  obtain  part  of  the  partial  contribution  would 
be  P + PQ  multiplications.  Alternatively,  P and  Q are  also  the  number  of  output 
nodes  for  one  subgraph  and  another,  respectively.  So  P and  Q can  be  obtained  by 
counting  the  number  of  output  nodes  of  the  irreducible  subgraphs.  Since  P+PQ  = 
P(l  + Q),  a fast  way  to  tell  the  required  number  of  multiplications  is  to  take  the 
product  of  P and  Q+ 1.  P + PQ  multiplications  will  also  be  the  complexity  to  compute 
the  full  size  partial  contribution  since  no  addition  operations  are  involved  and  the 
expansion  of  partial  contribution  to  its  full  size  adds  virtually  no  complexity.  Suppose 
the  order  of  the  two  subgraphs  is  reversed  when  cascaded,  the  complexity  becomes 
Q + QP.  But  Q + QP  * P+PQ  if  P Q.  Thus,  the  order  of  the  subgraphs  is  relevant 
to  the  complexity.  If  P<Q,  P + PQ  is  always  smaller  than  Q + QP.  Therefore,  the 
order  selection  should  be  such  that  the  first  one  requires  less  complexity  than  the 
second  one.  The  numbers  of  multiplications  required  for  different  combinations  of 
u and  V are  shown  in  Table  5.1  and  Table  5.2  for  a 4 x 4 block  and  an  8 x 8 block, 
respectively.  Note  that  for  u = 0 or  M/2  and  v = 0 or  N/2,  no  multiplication  is  required 
(except  a left  shift  operation  by  one  bit).  To  predict  the  required  number  of 
multiplications  for  large  M and  N,  an  analytical  analysis  for  the  cases  of  M = N is 
given  next. 

For  convenience  as  well  as  practical  consideration,  let  N=2^,  where  p is  a 
positive  integer.  In  this  case,  cd(N,r)  can  be  1,2,4,...,  N/2.  For  a particular  r^^O,  if 
I^max=  ^rhas  Only  N/2-degree  of  self-symmetry.  If  R^a^=2,  § ^has  both  N/4-degree 
and  N/2-degree  of  self-symmetry.  In  general,  we  will  have  N/2R-degree  of  self- 
symmetry, where  R = 1,2,4,...,  R„,ax-  Bo*"  r=0,  is  simply  a constant  from  eq.  5.2a.  In 


115 


Table  5.1  The  number  of  multiplications  associated  with  (4  x 4). 
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Table  5.2  The  number  of  multiplications  associated  with  (8  x 8). 
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10 

20 
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Table  5.3  The  values  of  and  gN  = N/2R„,ajj  in  terms  of  r. 


N=2 

II 

00 

II 

II 

r 

^max 

Sn 

^max 

8n 

^max 

8n 

^max 

8n 

0 

1 

1 

2 

1 
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1 

1 

1 

1 

1 

2 

1 

4 

1 

8 

2 

2 

1 

2 

2 

2 

4 

3 

1 

2 

1 

4 

1 

8 

4 

4 

1 

4 

2 

5 

1 

4 

1 

8 

6 

2 

2 

2 

4 

7 

1 

4 

1 

8 

8 

8 

1 

9 

1 

8 

10 

2 

4 

11 

1 

8 

12 

4 

2 

13 

1 

8 

14 

2 

4 

15 

1 

8 
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this  case,  it  has  all  degrees  of  symmetry,  i.e.,  from  N/2  to  1.  Thus,  the  equivalent 
Rmax  is  N/2.  For  N= 2,4,8,  and  16,  the  values  of  R^ax  N/2R^ajj  in  terms 

of  r are  shown  in  Table  5.3.  Note  that  R„,a,(  is  the  function  of  both  N and  r.  Thus,  it 
can  be  expressed  as  Rn,a,j(N,r).  For  example,  Rn,g,((8,3)  = 1 from  Table  5.3. 

Now  let’s  examine  the  computational  saving  by  the  use  of  self-symmetry.  One- 
dimensional case  is  considered  first  and  then  extended  to  two  dimensional  case.  For 
one-dimensional  data,  N/2R^g^(N,r)  is  the  number  of  multiplications  for  eq.  4.13  to 
reconstruct  the  data  at  the  r-th  stage  given  the  length  of  the  data  is  N.  For 
convenience,  we  define  ff,j(r)  and  gN(r)  as  follows: 


Mr)=R^(N,r) 


(5.18) 


N 


(5.19) 


Thus,  gN(r)  is  just  a short  hand  notation  for  N/2R„a^(N,r).  We  will  derive  an 
analytical  expression  for  the  average  of  gfg(r)  in  terms  of  any  N = 2p.  Let  the  average 
of  g^(r)  be  Gn,  then 


N-l 


N-l 


(5.20) 


2^-1 


Gm'-  T,  — 

2 So  Ur) 

The  following  relationships  are  also  obtained  by  the  inspection  of  Table  5.3: 


(5.21) 


/2n(^)=N,M0)=N/2 

f2tm=um)=N 

/2w('')  =///'■)» 

By  using  the  relationships  in  eq.  5.22,  eq.  5.21  can  be  written  as 


(5.22) 


G»-2G„-i 


(5.23) 


The  solution  to  eq.  5.23  is 
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(5.24) 


Thus  the  average  number  of  multiplications  for  each  stage  of  reconstruction  has  been 
reduced  from  N to  eq.  5.24.  For  large  N,  N/3,  i.e.,  only  one  third  of  the  original 
number  of  multiplications  is  required. 

Consider  a set  of  N x N data, 


From  eq.  4.18,  is  the  required  computation  for  one  stage  of 

approximation  of  the  original  data.  is  the  average  number  of  multiplications 
required  to  compute  or  to  compute  yij^i‘$j*  given  that  has  been 

precomputed  and  stored  as  a look-up  table.  However,  the  size  of  the  table  increases 
in  the  order  of  or  O(N^).  Note  that  there  are  pairs  of  (i,j),  each  of  which 
requires  a table  with  O(N^).  Thus,  the  overall  size  of  the  entire  tables  is  0(N'’).  This 
may  not  be  acceptable  in  a system  where  large  N is  implemented  and  the  memory 
spaces  are  limited. 

One  way  to  get  around  this  problem  is  to  compute  y^#/  first,  followed  by 
(yij$i‘)$j  . In  this  case,  the  size  of  the  table  is  in  0(N)  since  only  one  dimensional 
basis  function  is  in  the  table.  Since  Q<i<N-l,  the  overall  size  of  the  entire  tables 
is  in  0(N  ).  With  this  strategy,  the  memory  requirement  is  greatly  reduced  at  the 
expense  of  a small  increase  in  computational  complexity.  This  increase  in 
computational  complexity  is  analyzed  as  follows.  The  average  number  of 
multiplications  for  one  stage  of  reconstruction  is 


N-l  N-l 


(5.25) 


Thus,  Gnxn  is  simply  GN^  that  is. 


(5.26) 


119 


E M'')  ^8j^r)g^s)]  (5.27) 

//  r=0  s=0 

That  is,  the  average  number  of  multiplications  is  increased  by  G^. 

The  number  shown  in  eq.  5.27  can  be  reduced  by  the  use  of  the  following 
strategy.  Note  that  yy$j*  can  be  computed  first,  followed  by  (yij^j*)#;'.  The  result  will 
be  the  same  if  we  compute  first,  followed  by  However,  the  required 

computational  complexity  maybe  different,  i.e.,  gN(r)  + BnCOSnCs)  gN(s)  + gN(s)gN(0 
if  gN(r)  gN(s).  Furthermore, 

8f^r)<gf^s)  -*  gn(.r)+gj^r)g/s)  < gf^s)+gj^s)gf^r)  (5.28) 

By  imposing  this  condition,  eq.  5.27  can  be  written  as 


1 


N‘ 


E 8j/r)[l+8N(s)]+  8f^s)[l+g^r)] 


(5.29) 


Define  and  to  be 


^N=  E 5//^)[l+5vv(-s)]=  E 5t/s)[1 +g^r)] 

gdry^gt^)  gi^r)>g^s) 


(5.30) 


and 


E (5-31) 

gn(r)’gfM) 

respectively.  Then  eq.  5.29  is  equivalent  to 

(5.32) 

iV^ 

By  the  inspection  of  Table  5.3,  the  following  recurrence  relationship  was  found  for 
An 


l0g2A^-l 


2 (=2  \ 2 
With  a lengthy  algebra  manipulation,  we  get 


\ 


/ 


(5.33) 
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Af,=A^l2^j-N(N^2)(N^^S)  (5.34) 

Afj  has  the  form  of  A[sj=Af^^2+h(N).  The  analytical  solution  to  this  recurrence 

relation  is 

1082^^-3 

An=A^^  E W2')  (5-35) 

i=0 

By  using  eq.  5.35  and  the  fact  that  A4  = 12  (after  an  easy  derivation  based  on  Table 
5.3),  eq.  5.34  is  found  to  be 

^^=12+^(iV^-256)+— (lV^-64)+-(lV^-16)+-(//-4),  N^4  (5.36) 

45  21  9 3 

Similarly,  has  the  following  recurrence  relationship  by  the  inspection  of  Table 
5.3: 


The  analytical  solution  to  eq.  5.37  is 


/^\3/ 
v2; 


2 


\ 


/ 


(5.37) 


fi^=32+— (iV'‘-256)+-(lV3-64),  N^4  (5.38) 

15  7 

Substituting  A[,j  and  B[sj  of  eq.  5.32  with  eq.  5.36  and  eq.  5.37,  respectively,  will  lead 
to  the  final  result 


1 


56 +1(1V‘^ -256) +— (VV^ -64) +-(iV^  - 16) +-(lV-4) 
9 21  9 3 


, N^4 


(5.39) 


For  the  case  when  N = 2,  the  average  number  of  multiplications  is  simply  2 based  on 


Table  5.3. 


For  comparison,  the  expressions  of  the  average  numbers  of  multiplications 
shown  in  eqs.  5.26  ,5.27,  and  5.39,  are  called  Ej,  E2,  and  E3,  respectively.  They  are 
functions  of  N.  Some  values  of  Ej,  E2,  and  E3  in  terms  of  N and  NxN  are  given  in 
Table  5.4.  The  values  listed  in  Table  5.4  are  plotted  in  Fig.  5.5.  The  lower-left 
portion  of  Fig.  5.5  is  magnified  and  shown  in  Fig.  5.6  for  a closer  view  of  the  figure. 
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Table  5.4  Some  values  of  Ej,  £3,  and  Ej  in  terms  of  N and  NxN, 


N 

NxN 

El 

E, 

E, 

2 

4 

1.0 

2.0 

2.0 

4 

16 

2.25 

3.75 

3.50 

8 

64 

7.56 

10.31 

9.63 

16 

256 

28.89 

34.27 

32.78 

32 

1024 

114.22 

124.91 

121.88 

64 

4096 

455.56 

476.90 

470.81 

128 

16384 

1820.89 

1863.56 

1851.38 

Avg.  number  of  multiplications  per  coefficient 


E1  E2  E3 


Fig.  5.5  A plot  of  the  data  in  Table  5.4. 
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E1  E2  E3 


Fig.  5.6  A plot  of  part  of  the  data  in  Table  5.4. 
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We  can  draw  some  conclusions  from  Table  5.4,  Fig.  5.5  and  Fig.  5.6.  First,  for 
N = 2,  Ej<  E2.  For  N>2,  Ej<  E3<  E2.  This  result  is  expected.  Second,  for  a very 
large  N,  we  have  Ej«  E/s  E2.  Thus,  for  a large  N,  the  advantage  of  low  complexity 
of  the  first  approach  is  gone  completely.  Third,  E/s  E2  for  all  values  of  N.  This 
means  that  the  third  approach  does  not  lower  the  computational  complexity  of  the 
second  approach  significantly.  Therefore,  for  the  simplicity  of  implementation,  the 
second  approach  may  be  better  than  the  third  one.  For  lower  values  of  N (say  up  to 

N = 16),  the  first  approach  is  better  at  the  expense  of  a larger  but  acceptable  size  of 
the  look-up  table. 

According  to  Table  5.2,  for  the  8x8  case  the  average  number  of  multiplications 
required  for  a nonzero  element  can  be  estimated.  Assume  that  the  chance  of  a 
nonzero  element  falling  in  any  u-v  pair  is  equally  likely.  Then  the  average  will  be 
1/64  of  the  sum  of  all  the  numbers  shown  in  Table  5.2.  The  result  is  9.5 
multiplications  per  nonzero  element.  This  is  about  7 to  20  times  faster  than  the  naive 
approach  mentioned  in  subsection  5.2.4.3.  In  addition,  this  result  is  even  good  enough 
for  general  image  decompression  using  the  JPEG  standard  (not  in  progressive  mode). 
In  fact,  for  this  particular  application,  the  DC  element  is  almost  always  nonzero  and 
the  other  nonzero  elements  tend  to  fall  in  the  u-v  pairs  clustering  around  the  DC 
element.  So  a more  accurate  estimate  of  the  average  should  be  less  than  9.5.  For 
example,  if  only  the  first  8 elements  along  the  zig-zag  scanning  pattern  are  nonzero, 
the  average  number  of  multiplications  per  nonzero  element  is  6.4. 

5.2.5.2  Number  of  Additions 

Note  that  no  addition  operations  are  required  in  the  calculation  of  partial 
contributions.  Therefore,  only  Step  5 of  the  algorithm  involves  addition  operations. 
There  are  two  sources  of  addition  operations  in  Step  5:  one  for  updating  [t]  and  one 
for  updating  [g].  If  n is  even,  updating  [t]  takes  (n/2)MN  additions  and  updating  [g] 
requires  (n/2-l)MN  additions,  a total  of  (n-l)MN  additions.  If  n is  odd,  updating  [t] 
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takes  (n/2-l)MN  additions  and  updating  [g]  requires  (n/2)MN  additions.  Thus,  the 
total  number  of  additions  is  always  (n-l)MN.  For  n=M=N  = 8,  this  number  is  448 
which  is  very  comparable  to  the  result  reported  by  other  studies  so  far  which  is  from 
430  to  580.  In  fact,  (n-l)MN  is  only  an  upper  bound.  The  actual  number  of  additions 
to  update  [t]  can  be  reduced  by  considering  the  mirror  effect  of  the  partial 
contribution. 

Let  n be  the  total  number  of  nonzero  elements  in  the  target  matrbc  and  the 
element  at  location  (u,v)  = (Uj,Vj)  be  the  i-th  nonzero  element  to  be  processed,  where 
i is  even  and  0 < i < n-1.  Let  j = i+ 1.  Without  the  loss  of  generality,  let  M and  N be 
even.  By  Step  5-1  of  the  algorithm,  the  partial  contribution  due  to  element  i,  denoted 
as  [p,^]j,  has  been  stored  in  [t].  So  [t,^]  = [Pxy]j  for  0 < x < M-1,  0 < y < N-1.  For  0 < 
X < M/2-1,  0 < y < N/2-1,  it  can  also  be  expressed  as 


[g  _= 

-l-x,yJ  - 
l^M-l-x,N-l-yJ 


iIm- 
t. 


(5.40) 


where  a =Uj,  =Vj,  and  y =Uj+Vj.  Note  that  no  addition  operation  is  required  in  this 


step.  Then,  by  Step  5-2  or  5-3,  [t]  is  updated  by  the  next  partial  contribution  [p^L, 
i.e.,  [t,^]  = [p^Ji  + [p^]j,  for  0 < x < M-1,  0 < y < N-1.  For  0 < x < M/2-1,  0 < y < 
N/2-1,  it  can  be  written  as 


iWl-y] 

BM-l-x,N-l-yJ  “ 


= [PJi  + [Pji 

-iffixyji  + 


i/iD 'ji  + -irM 


(5.41) 


where  A = Uj,  B=Vj,  and  r =Uj+Vj.  If  Uj,  Vj,  Uj,  and  Vj  are  all  even,  then  eq.5.41  can  be 
rewritten  as 


[ixy]  [Pipli  [Pxy]j 

Ltx,N-l-yJ  - lU 
L^M-l-x,N-l-yJ  IV-I 


(5.42) 
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for  0 < X < M/2-1,  0 < y < N/2-1.  In  this  case,  MN/4  additions  are  required  rather 
than  MN.  If  all  Uj  and  V;  are  even,  where  0 < i < n-1,  we  would  get  the  following 
number  of  additions  for  updating  [t]:  (n/2)MN/4  if  n is  even,  and  (n/2-l)MN/4  if 
n is  odd.  The  number  of  additions  required  to  update  [g]  is  unchanged.  So  in  this 
case,  the  total  number  of  additions  would  be  (5n/8-l)MN  if  n is  even  and  (5n/8- 
1/4)MN  if  n is  odd.  For  n = M=N  = 8,  the  number  is  256  , which  is  significantly  lower 
than  the  upper  bound  of  448.  Note  that  the  result  derived  above  is  based  on  the 
assumption  that  each  updating  cycle  of  [t]  takes  MN/4  additions.  In  fact,  in  some 
updating  cycles,  less  than  MN/4  additions  is  possible.  For  example,  if  Uj=4,  Vj  = 0, 
Uj=4,  Vj=4,  then  the  [t^^y]  in  eq.  5.42  for  0<  x<  M/2-1,  0<  y < N/2-1  can  be  written 
as 


[ixy]  [Pxv]i  tPiyi 


tx,N/2-l-y. 


t f 

cl 


=Twi 


(5.43) 


for  0 < X < M/4-1,  0<  y < N/4-1.  This  leads  to  MN/16  additions  rather  than  MN/4. 
Therefore,  neither  (5n/8-l)MN  (n  = even)  nor  (5n/8-l/4)MN  (n  = odd)  is  a lower 
bound.  The  true  lower  bound,  which  is  difficult  to  derive  analytically,  is  lower  than 
either  one.  Nevertheless,  the  result  is  still  useful  because  it  provides  a rough  idea 
about  how  well  the  algorithm  can  perform.  For  convenience,  it  is  called  a pseudo 
lower  bound.  After  deriving  the  upper  bound  and  a pseudo  lower  bound,  the  average 
case  is  examined  next. 

Eq.  5.41  is  shown  here  again  for  convenience. 


[txy] 
PM-l-x,y  ~ 
[tx,N-l-y]  = 
PM-l-x,N-l-yJ  “ 


- [t^yli  + [Ui 

ttxyj. 


(5.44) 

(5.45) 

(5.46) 

(5.47) 


where  a -Uj,  )0  =Vj,  y =Uj+Vj,  A = Uj,  B=Vj,  and  r =Uj+Vj.  If  a and  A are  both  odd  or 
even,  eq.  5.45  requires  no  addition  operation.  Similarly,  if  p and  B are  both  odd  or 
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even,  eq.  5.46  requires  no  addition  operation.  Finally,  if  y and  r are  both  odd  or 
even,  eq.  5.47  requires  no  addition  operation. 

Suppose  that  the  nonzero  elements  in  the  target  matrix  are  chosen  randomly. 
In  addition,  the  target  matrix  is  assumed  to  be  sparse  enough,  i.e.,  MN  > > n.  Then, 
we  may  assume  that  the  even-odd  status  of  (u;,Vj)  is  independent  of  the  status  of 
(U|^,V|.),  for  all  k < i.  In  this  case,  Pr(Uj  is  even  and  is  even),  Pr(Uj  is  even  and  Vj  is 
odd),  Pr(Uj  is  odd  and  Vj  is  even),  and  Pr(Uj  is  odd  and  Vj  is  odd)  are  all  equal  to  1/4, 
where  Pr(x)  is  the  probability  that  event  x occurs.  The  above  probability  distributions 
are  also  applied  for  (Uj,Vj).  Under  the  independence  assumption,  the  probabilities  of 
the  joint  even-odd  status  of  (Uj,Vj)  and  (Uj,Vj)  are  simply  the  product  of  each  individual 
probability  of  the  even-odd  status.  They  are  all  1/16.  Some  combinations  of  the  joint 
even-odd  status  result  in  less  computation  effort  than  others.  For  example,  if  Uj,  Vj, 
Uj,  and  Vj  are  all  even,  only  a quarter  of  the  full  load  computation  is  required,  where 
the  full  load  computation  is  defined  as  the  MN  addition  operations.  If  U;  and  Vj  are 
both  odd  and  Uj  and  Vj  are  both  even,  three  quarters  of  the  full  load  computation  are 
required  since  only  eq.  5.47  needs  no  computation  while  eq.  5.44  to  eq.  46  require 
3MN/4  operations.  Computational  conditions  of  other  combinations  of  even-odd 
status  are  given  in  Table  5.5.  From  the  table,  the  average  number  of  addition  to 
update  [t]  is  5/8  of  the  full  load  computation.  But  updating  the  goal  matrix  by 
[g]  “[§]  + [!]  needs  a full  load  computation.  Therefore,  in  terms  of  the  full  load 
computation,  the  total  number  of  addition  operations  for  all  nonzero  elements  is 

(5/8)(n/2)  + (n/2-l)  = 13n/16-l  (n  is  even)  (5.48) 

or 

(5/8)(n/2-l)  + (n/2)  = 13n/16-5/8  (n  is  odd)  (5.49) 

where  n is  the  number  of  nonzero  elements  in  the  target  matrix.  It  is  easy  to  verify 
that  the  eqs.  5.48  and  5.49  are  smaller  than  n-1  which  is  the  upper  bound  described 
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earlier  in  this  section.  Several  values  of  n and  their  corresponding  computation  loads 
are  given  in  Table  5.6,  where  the  ratio  x/(n-l)  is  also  shown  in  Fig.  5.7.  For  n>  1, 

13  . 13  5 

16  16  8 13  (5.50) 

n-1  ° n-1  “ l6 

This  means  that  it  takes  about  13/16  of  the  upper  bound  for  almost  all  values  of  n. 
Again,  a more  accurate  estimate  of  the  average  number  should  be  lower  than  that 
in  eq.  5.52  or  eq.  5.49  just  as  in  the  case  of  lower  bound  estimation. 

The  results  shown  in  eq.  5.48  to  eq.  5.49  are  based  on  the  assumption  that  the 
nonzero  elements  are  chosen  randomly.  If  the  nonzero  elements  can  be  chosen  in  a 
particular  order,  the  actual  number  of  additions  required  can  be  lower  than  that 
shown  previously.  For  example,  the  nonzero  elements  can  be  divided  into  four  groups 
as  (even,even),  (even,odd),  (odd,even),  and  (odd,odd)  according  to  their  even-odd 
status  of  the  location  indexes.  The  elements  in  one  group  are  processed  in  a 
sequence,  and  switch  to  another  group  only  when  all  the  elements  in  the  current 
group  are  processed.  With  this  strategy,  the  required  number  of  additions  can  be 
much  lower  than  13/16  of  the  upper  bound. 

5.2.5.3  The  Scanning  Pattern  and  Delay  Time  Consideration 

Zig-zag  scanning  pattern  is  good  for  a wide  variety  of  images.  However,  it  is  not 
optimal  for  individual  images.  The  optimal  or  near  optimal  scanning  pattern  for  a 
picture  may  be  quite  different  from  those  for  other  images.  In  fact,  one  of  the  current 
research  areas  of  the  PIT  is  to  find  a better  scanning  pattern  for  a particular  image 
based  on  the  human  visual  system  (Chitprasert  and  Rao  [1990]).  The  proposed 
approach  in  the  dissertation  is  suitable  for  any  scanning  patterns  since  each  element 
of  the  input  matrix  is  treated  separately.  Therefore,  it  coincides  with  the  latest 
research  development  in  the  PIT.  In  addition,  it  can  start  computing  as  soon  as  a 
nonzero  DCT  coefficient  is  received.  Thus,  it  has  virtually  no  delay  time. 


129 


Table  5.5  Different  computational  loads  to  update  the  temporary  matrix  [t]  in 

terms  of  the  full  load  computation  (e  = even,  o = odd). 


Uji  e,  Vj:e 

Uj:  e,  Vj:o 

Uj:  0,  Vjie 

Uji  0,  V::o 

Uji  e,  Vjie 

1/4 

3/4 

3/4 

3/4 

u,:  e,  v,:o 

3/4 

1/3 

3/4 

3/4 

Uji  0,  v,:e 

3/4 

3/4 

1/4 

3/4 

Uji  0,  Vj:o 

3/4 

3/4 

3/4 

1/4 

Table  5.6  Average  number  of  additions  in  terms  of  the  full  load  computation. 


n 

2 

3 

4 

5 

6 

7 

8 

9 

5 

29 

9 

55 

31 

81 

11 

107 

8 

16 

4 

16 

8 

16 

2 

16 

n-l 

1 

2 

3 

4 

5 

6 

7 

8 

Average  case  to  worst  case  ratio 
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Fig.  5.7  Asymptotic  behavior  of  the  ratio  between  average  and  worst. 
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5.3  Discrete  Fourier  Transformed  Images 


The  performance  of  the  DFT  is  inferior  to  the  DCT  in  terms  of  image 
compression  (Rao  and  Yip  [1990]).  So  why  discuss  the  DFT  images?  First,  recall  that 
unitary  transforms  rather  than  orthogonal  transforms  were  discussed  in  Chapters  III 
and  IV.  The  DFT  is  the  only  complex  sinusoidal  transform  discussed  in  section  4.4.2. 
The  discussion  of  both  the  DFT  and  DCT  can  demonstrate  the  generality  of  the 
principles  discussed  in  Chapters  III  and  IV.  Second,  the  DFT  is  chosen  here  to 
demonstrate  how  to  deal  with  progressive  reconstruction  problems  with  complex 
transforms.  The  treatment  of  using  complex  transforms  for  the  PIT  will  be  slightly 
different  than  that  of  using  real  transforms.  Third,  the  DFT  is  probably  the  most 
frequently  and  widely  used  transform  in  both  one  and  two  dimensional  signal 
processing.  Thus,  it  is  important  to  see  if  the  principles  discussed  in  Chapter  III  and 
IV  can  also  be  applied  to  such  a popular  transform.  Fourth,  there  are  a lot  of  fast 
transforms  available  which  are  called  fast  Fourier  transforms  (FFT).  The 
performance  of  the  FFT  can  be  compared  to  that  of  our  approach  . 

The  basis  functions  of  the  DFT  are  shown  in  Table  4.1  as 


for  r,  k = 0,l,...,  N-1.  Note  that  either  plus  or  minus  sign  in  the  exponential  term  of 
equation  5.51  can  be  used  for  the  forward  transform.  They  are  both  valid 
orthonormal  functions.  Let  the  r-k  element  of  one  unitary  DFT  matrix  # be 


for  r,  k = 0,l,...,  M-1.  Similarly,  let  the  r-k  element  of  another  unitary  DFT  matrix  Y 


1 ^±J2nrk/N 


(5.51) 


(5.52) 


be 
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1 


4n 


(5.53) 


for  r,  k = 0,l,—,  N-1.  Let  f be  an  M x N matrix  representing  the  2-D  data  (real 
numbers  not  complex  numbers)  and  F be  its  2-D  DFT.  Then,  a common  definition 
of  the  2-D  DFT  is 


<l>oo 


4) 


10 


4>0,w-l 

4>i,w-i 


4>M-1,0  •••  4>W-1,W-1 


Joo 

f\Q 


fu-Xfi 


4^00  •••  Vi,o 

4^01  • 


■■■  '^N- 


1^-1. 


Substituting  eqs.  5.52  and  5.53  into  eq.  5.54  gives  the  u-v  element  of  F 


Fu.- 


1 


M-l  N-l 


g -J2niiuiM+vylN) 


(5.54) 


(5.55) 


^MN x=0  y=0  ™ 

where  u = 0,  ...,  M-l,  v = 0,  ...,  N-l.  Similarly,  f is  given  by  the  2-D  inverse  DFT 
(IDFT)  of  F,  defined  as 


/=«D'FT 


4*00  4>w-i,o 

4^0,1  4>a/-i,i 


4) 


0,W-1 


4>w- 


00 


10 


0,N-l 


^M-1,0  "■  ^W-1,A1-1. 


4^00 


^0^-] 


10 


1,W-1 


Substituting  eqs.  5.52  and  5.53  into  eq.  5.56  gives  the  x-y  element  of  f 


M-l  N-l 


^MN «=o  v=o 


P p -j2n(uxfM*vylN) 


(5.56) 


(5.57) 


where  x = 0, ...,  M-l,  y = 0, ...,  N-l.  The  definitions  of  partial  contribution  for  the  DCT 
in  the  last  section  is  adopted  here  again.  Let  [fx,y]uv  be  the  partial  contribution  to  the 
goal  matrix  [f^J  due  to  F„^  alone,  then 
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(5.58) 


Note  that  the  constant  in  front  of  the  double  summation  of  eq.  5.57  has  been  ignored 
in  eq.  5.58  for  the  same  reason  described  in  the  last  section. 

From  eq.  5.55,  we  have 


By  combing  the  results  of  eqs.  5.59,  5.60,  5.61  and  5.62,  we  conclude  that  for  l<u<M- 


F^m/2.n/2  if  N are  even,  or  symmetrical  about  an  artificial  coefficient  F^jrt,N/2 

if  M and  N are  odd.  For  u = 0,  l<v<N-l,  the  DFT  coefficients  are  conjugate 
symmetrical  about  the  coefficient  Foj,jyr2  if  M and  N are  even  or  symmetrical  about 
an  artificial  coefficient  Fqn/2  ‘f  ^ and  N are  odd.  For  l<u<M-l,  v = 0,  the  DFT 
coefficients  are  conjugate  symmetrical  about  the  coefficient  F^^^q  if  M and  N are 
even  or  symmetrical  about  an  artificial  coefficient  F^,/2,o  if  M and  N are  odd.  The  DC 
coefficient  or  Fqo  is  conjugate  symmetrical  about  itself,  i.e.,  it  is  real. 

For  M and  N are  even,  Fg  Fm/2,0’  Fm/2,n/2  ^^e  also  conjugate  symmetrical 

about  themselves  respectively,  i.e.,  they  are  all  real.  This  statement  can  be  verified 
pitigging  (u,v)  = (0,N/2),  (M/2,0),  and  (M/2,N/2),  respectively  into  eq.  5.55.  For 
M and  N are  odd,  only  DC  coefficient  is  conjugate  symmetrical  about  itself. 


M-l  N-l 


(5.59) 


From  eqs.  5.59  and  5.55,  we  have 


(5.60) 


Similarly,  we  can  get 


(5.61) 


and 


(5.62) 


1,  l<v<N-l,  the  DFT  coefficients  are  conjugate  symmetrical  about  the  coefficient 
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Most  coefficients  and  other  coefficients  form  conjugate  symmetrical  pairs. 
Therefore,  only  about  half  of  all  the  coefficients  need  to  be  sent  for  communication 
purpose  because  the  coefficients  in  the  other  half  can  be  derived  automatically  by  the 
receiver  with  eqs.  5.59.  For  the  PIT,  suppose  that  one  DFT  coefficient  is  received, 
and  we  want  to  reconstruct  the  original  image  data  given  only  this  coefficient.  In 
other  words,  we  want  to  calculate  the  partial  contribution  due  to  alone.  However, 
plugging  this  coefficient  directly  into  eq.  5.58  usually  gives  a complex  [4y]  rather  than 
a real  [f^y].  This  is  not  desirable  because  the  reconstructed  image  data  should  be  real 
not  complex.  To  ensure  that  [f^y]  is  always  real,  the  partial  contributions  due  to  both 
the  coefficient  received  and  its  complex  conjugate  coefficient  derived  are  computed 
together.  That  is. 


-p  gJ2n(ux/M+vytN)^,p  gJ2ni(M-u)xlM+(N-v)ylN) 


uv 


= F +vy/N)  .p* a -J2n(ttx/M*vylN) 

^ uv^  ^ uyr 

=2/?e{[y  4 


(5.63) 


where  Re{x}  denotes  the  real  part  of  complex  number  x.  Note  that  although  two 
partial  contributions  are  computed,  the  computational  complexity  is  about  the  same 
as  that  of  taking  only  one  partial  contribution.  [4,y]„^  can  be  calculated  by  the  direct 
use  of  eq.  5.56.  However,  a much  faster  approach  based  on  the  symmetrical  property 
of  the  basis  functions  (eq.  5.51)  will  be  given  next. 

From  eq.  5.56,  we  have 


\f  ^ =F  e-J2nMMIR-x)lM*v(filS-y)IN\ 

VMIR-xJ^IS-y^uv 

=g-J2it(.u/R*\iS)p  gJ2n(uxlM*vy/N) 

where  R and  S are  integers  such  that 


UV 


(5.64) 


M=<x^R,  N=a^S  (5.65) 

where  aj  and  a2  are  also  integers.  If  u is  divisible  by  R and  v is  divisible  by  S,  or 

(5.66) 

where  13  ^ and  ^2  integers,  eq.  5.64  can  be  rewritten  as 
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\f  1 =F  g -/2n(«/W+ioyA0  (5.67) 

V M/R-x^lS-yiuv  ^ ’ 

From  eq.  5.63,  we  know  that  we  are  more  interested  in  the  real  part  of  the  partial 
contribution  than  the  partial  contribution  itself.  From  eq.  5.56,  we  can  get 


\ 


(5.68) 


where  Im{x}  is  the  imaginary  part  of  complex  number  x.  From  eq.  5.67,  we  can  get 


m ^ ^ ■ 

From  eqs.  5.68  and  5.69,  we  get 


■/wjFJsii^ 


\ 


\M  N)\ 


(5.69) 


^^^MIR-xj^ls-:^uv}  2//n|Fj^|sin  271 


(5.10) 


M N 

Once  every  term  in  eq.  5.68  is  known,  eq.  5.70  can  be  determined  with  the 


operation  of  shifting  the  register  left  by  one  bit  and  the  operation  of  one  addition. 
The  significance  of  eq.  5.70  is  that  given  some  real  parts  of  the  partial  contribution, 
the  rest  of  them  can  be  determined  by  one  extra  addition.  By  doing  so, 
computational  saving  is  achieved.  Next,  we  examine  how  much  computation  can  be 


saved. 


From  eqs.  5.65  and  5.66,  we  know  that  R must  be  a common  divisor  of  M and 
u and  S must  be  a common  divisor  of  N and  v,  i.e.,  R = cd(M,u)  and  S = cd(N,v), 
where  cd(a,b)  is  a common  divisor  of  integers  a and  b.  The  minimum  of  R = R^j„  = 1 
and  the  maximum  of  R = R„,g,(  =gcd(M,u)  or  R e [l,gcd(M,u)],  where  gcd(a,b)  is  the 
greatest  common  divisor  of  integers  a and  b.  Similarly,  the  minimum  of  S = S„,in  = l 
and  the  maximum  of  S = S„,ax  =gcd(M,u)  or  S e [l,gcd(N,v)].  For  simplicity  and 
practical  applications,  let  M = N=2P,  where  p is  a positive  integer. 

When  Rniax“S„,ax=l>  eq*  5.68  needs  to  be  performed  N^2  times  and  eq.  5.70 
also  needs  to  be  calculated  NV2  times.  When  R„,ax=2  and  S„,ax=  eq.  5.68  needs  to 
be  performed  N"/4  times  and  eq.  5.74  needs  to  be  calculated  3N^/4  times.  In  general, 
eq.  5.68  needs  to  be  performed 
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1 

2 


N \ N 


\ 


(5,71) 


max  / 


times  and  eq.  5.70  needs  to  be  calculated 


l(  N )(  N 


\ 


(5.72) 

times.  Eq.  5.68  takes  2 real  multiplications  and  one  real  addition  and  eq.  5.70 


requires  one  real  addition,  the  number  of  multiplications  is 


N \ N 


\ 


(5,73) 

^Tnax^*^max  j 

and  the  number  of  additions  is  simply  the  sum  of  eq.  5.71  and  eq.  5.72  which  gives 
N . Eq.  5.73  is  the  number  of  multiplications  required  to  reconstruct  the  data  when 
a coefficient  is  received.  Note  that  eq.  5.73  is  a function  of  u and  v.  Different 
values  of  u-v  pairs  will  result  in  different  number  of  multiplications.  Now  we  want 
to  find  the  average  number  of  multiplications  over  every  possible  u-v  pair.  The 
derivation  is  similar  to  that  in  section  5.2.  The  only  difference  is  that  eq.  5.19  should 
be  replaced  by 


(5.74) 

The  derivation  of  the  rest  of  the  equations  following  those  of  eq.  5.23  in  section  5.2 
should  be  adjusted  accordingly.  The  final  result  is 


r -1 

^NxN  g 


\2 


N+- 

N 


(5.75) 


/ 


The  result  is  based  on  the  assumption  that  the  cosine  and  sine  terms  in  eq.  5.68  have 
been  precomputed  and  stored  in  a table.  The  size  of  the  table  may  be  too  large  to 
be  practical  because  it  is  0(N‘*).  The  size  of  the  table  can  be  reduced  to  O(N^)  by 
the  use  of  following  identities: 


cos(a  + P)  =cos(a)cos(P) -sin(a)sin(P) 
sin(a  + P)  =sin(a)cos(P)  +cos(a)sin(P) 

where 


(5.76) 


This  will  increase  the  computational  complexity  because  eq.  5.76  involves  4 
multiplications  and  two  additions.  This  additional  computation  may  be  too  costly  to 
be  acceptable.  So,  for  practical  consideration,  N should  be  kept  small  to  have  a 
reasonable  table  size  and  to  avoid  the  additional  computation. 


CHAPTER  VI 
SIMULATION  STUDIES 


6.1  Sparse  Matrices  Consisting  of  Quantized  DCT  Coefficients 
In  this  section,  many  real  world  images  are  processed  to  see  the  degree  of 
sparsity  of  the  input  matrices  to  IDCT.  The  JPEG  quantization  strategy  and  one  of 
its  recommended  quantization  tables  are  used  in  this  simulation  study. 

If  the  JPEG  quantization  scheme  (eq.  A.4)  and  its  recommended  table  (Table 
A.l)  are  used,  any  spatial  frequencies  whose  amplitudes  are  less  than  half  of  their 
corresponding  values  in  the  quantization  table  will  be  discarded  and  treated  as  zeros 
in  the  IDCT  processing  step.  To  get  a picture  of  how  many  spatial  frequencies  are 
retained  (i.e.,  how  many  non-zeros  are  retained  in  the  IDCT  processing  step)  on  the 
average  after  the  quantization,  some  512  x 512  8-bit  greyscale  images  and  RGB  (red, 
green,  and  blue)  components  of  color  images  were  processed.  The  result  is  shown  in 
the  second  column  of  Table  6.1.  The  result  is  further  divided  into  AC  part  and  DC 
part  and  are  shown  in  columns  3 and  4,  respectively.  It  is  shown  that  more  than  three 
quarters  of  the  quantized  coefficients  are  zero  on  the  average  for  these  images.  Even 
with  so  many  coefficients  set  to  zero,  the  decompressed  images  are  near  perfect  in 
a sense  that  the  difference  between  the  decompressed  image  and  the  original  image 
is  indistinguishable  visually.  To  illustrate,  three  original  images  (Lena,  Boat,  and 
Baboon)  are  shown  in  Fig.  6.1(h),  Fig.  6.2(h),  and  Fig.  6.3(h),  and  their  decompressed 
images  are  shown  in  Fig.  6.1(f),  6.2(f),  and  6.3(f),  respectively.  These  three  images 
are  chosen  because  they  can  roughly  represent  the  images  with  low,  medium,  and 
high  activities,  respectively. 
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Table  6.1  Average  number  of  nonzero  quantized  DCT  coefficients  in  an  8 x 

8 image  block  using  two  similar  quantization  strategies. 


File  Names 

Total 

AC 

DC 

Lena.512 

6.1292/4.0149 

5.1397/3.0378 

0.9895/0.9771 

F16.BLU 

6.1860/3.9050 

5.1875/2.9070 

0.9985/0.9980 

Elaine.512 

7.0893/4.1116 

6.1118/3.1514 

0.9775/0.9602 

F16.RED 

7.5058/4.9294 

6.5112/3.9377 

0.9946/0.9917 

Peppers.GRN 

7.5469/4.7265 

6.5525/3.7385 

0.9944/0.9880 

F16.GRN 

7.8784/5.2495 

6.8848/4.2622 

0.9936/0.9873 

Tiffany.GRN 

7.9226/4.6956 

6.9236/3.6973 

0.9990/0.9983 

Moon.512 

8.0996/4.4619 

7.1245/3.5081 

0.9751/0.9539 

Couple.512 

8.7761/5.6248 

7.8044/4.6794 

0.9717/0.9453 

Boat.512 

9.2012/5.9963 

8.2163/5.0273 

0.9849/0.9690 

SF0.512 

9.6799/5.6426 

8.6916/4.6643 

0.9883/0.9783 

House.GRN 

9.9133/6.5769 

8.9255/5.5991 

0.9878/0.9778 

Mall.512 

13.641/9.3364 

12.649/8.3479 

0.9917/0.9885 

Creek.512 

14.774/9.5554 

13.789/8.5835 

0.9854/0.9719 

Baboon.RED 

15.548/9.8010 

14.565/8.8320 

0.9824/0.9690 
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If  a small  visible  image  degradation  is  allowed,  the  numbers  shown  in  columns 
2 to  4 of  Table  6.1  will  be  smaller.  To  see  how  smaller  the  numbers  will  be,  the 
roundoff  operation  in  eq.  A.4  was  replaced  by  truncation  operation  and  the 
processing  was  repeated.  The  new  numbers  are  shown  as  the  second  number  in 
columns  2 to  4.  The  decompressed  images  are  seen  to  have  only  minor  degradation, 
and  the  degradation  does  not  diminish  our  capability  to  recognize  the  objects  in  the 
images  in  a meaningful  way.  To  illustrate.  Fig.  6.1(g),  Fig  6.2(g),  and  Fig  6.3(g)  are 
the  decompressed  Lena,  Boat,  and  Baboon  images,  respectively,  using  the  truncation 
operation.  In  fact,  the  image  quality  is  much  better  than  it  is  necessary  in  the 
intermediate  stages  of  PIT  for  image  browsing. 

To  illustrate  the  progression  of  PIT,  three  sequences  of  images  are  shown  in 
Figs.  6.1(a)-(e),  Figs.  6.2(a)-(e),  and  Figs.  6.3(a)-(e).  Fig.  6.1(a)  is  the  decompressed 
Lena  image  when  only  the  first  nonzero  quantized  coefficient  along  the  zig-zag  scan 
of  each  8x8  image  block  is  used  to  reconstruct  the  image.  Fig.  6.2(b)  is  the 
decompressed  Lena  image  when  only  the  first  two  nonzero  quantized  coefficients 
along  the  zig-zag  scan  of  each  8x8  image  block  are  used  to  reconstruct  the  image. 
Figs.  6.2(c),  6.2(d),  and  6.2(e)  are  the  decompressed  Lena  images  when  only  the  first 
three,  four,  and  five  nonzero  quantized  coefficients,  respectively,  along  the  zig-zag 
scan  of  each  8x8  image  block  are  used  to  reconstruct  the  image.  The  images  in  Figs. 
6.2(a)-(e)  and  Figs.  6.3(a)-(e)  were  obtained  in  the  same  manner  as  those  in  Figs. 
6.1(a)-(e).  The  essential  features  of  the  Lena,  Boat,  and  Baboon  images  can  be  well 
recognized  in  the  fifth  stage  of  progression  (Fig.  6.1(e),  Fig.  6.2(e),  and  Fig.  6.3(e)). 
That  is,  approximately  five  nonzero  quantized  coefficients  in  each  8x8  image  block 
are  enough  to  reconstruct  a recognizable  image.  An  8x8  matrix  consisting  of  only 
five  nonzero  elements  is  definitely  a good  candidate  for  a sparse  matrix. 

Note  that,  from  Table  6.1,  the  number  of  nonzero  quantized  DCT  coefficients 
decreases  significantly  at  the  expense  of  a minor  degradation  in  image  quality.  In  the 
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Fig.  6.1  The  decompressed  Lena  images  under  different  input  conditions. 
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Fig.  6.1  Continued. 
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Fig.  6.2  The  decompressed  Boat  images  under  different  input  conditions. 
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Fig.  6.2  Continued. 
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Fig.  6.3  The  decompressed  Baboon  images  under  different  input  conditions. 
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Fig.  6.3  Continued. 
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inverse  PIT,  the  average  number  of  nonzero  elements  in  an  8 x 8 matrix  does  not 
need  to  be  greater  than  the  second  numbers  in  column  2.  These  numbers  imply  that 
most  of  the  matrices  consisting  of  quantized  DCT  coefficients  for  image 
decompression  can  be  considered  sparse.  To  visualize  which  quantized  coefficients 
are  usually  retained  and  which  are  not,  the  distributions  of  nonzero  DCT  coefficients 
in  an  8 X 8 image  block  for  Lena,  Boat,  and  Baboon  images  are  shown  in  Fig.  6.4, 
Fig.  6.5,  and  Fig  6.6,  respectively.  As  expected,  the  JPEG  quantizer  as  well  as  other 
practical  quantizers  tend  to  discard  the  coefficients  of  high  spatial  frequency  and 
retain  most  of  the  low  frequency  coefficients.  By  comparing  the  image  activity  of  the 
three  images  and  the  Figs.  6.4  to  6.6,  we  may  conclude  that  the  nonzero  DCT 
coefficients  of  a busy  image  are  more  equally  distributed  than  those  of  an  image  with 
low  activity  (or  a smooth  image). 

6.2  Functional  Test  of  the  Proposed  Algorithms 

The  purpose  of  this  simulation  study  is  to  show  the  proposed  algorithms  can 
perform  the  correct  functions.  In  other  words,  the  proposed  IDCT  and  IDFT 
algorithms  can  perform  the  function  of  inverse  discrete  cosine  transform  and  inverse 
discrete  Fourier  transform,  respectively.  Specifically,  given  some  test  data  as  inputs 
to  the  proposed  algorithms,  the  outputs  of  the  proposed  algorithms  should  be  the 
same  as  those  produced  by  the  definitions  of  the  IDCT  and  the  IDFT. 

To  generate  the  test  data,  both  randomly  generated  numbers  (real  numbers) 
and  real  image  data  (integers  between  0 to  255)  are  used  as  inputs  to  the  forward 
transforms  (FDCT  and  FDFT).  The  forward  transforms  are  performed  first  by  the 
definitions  of  the  transforms.  Two  traditional  fast  transforms  are  used  later  to  speed 
up  the  process.  They  are  the  fast  DCT  developed  by  Chen  et  al.  [1977]  and  the  FFT 
developed  by  Cooley  and  Turkey  [1965].  4-point  and  8-point  fast  DCT  and  FFT  are 
used  in  this  simulation  study.  Floating  points  rather  than  fixed  points  are  used  to 


148 


Histogram  of  no.  of  nonzero  quantized  DCT  coefs. 
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Fig.  6.4  The  distribution  of  quantized  DCT  coefficients  for  Lena  image. 
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Histogram  of  no.  of  nonzero  quantized  DCT  coefs.s 
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Fig.  6.5  The  distribution  of  quantized  DCT  coefficients  for  Boat  image. 
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Histogram  of  no.  of  nonzero  quantized  DCT  coefs. 
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Fig.  6.6  The  distribution  of  quantized  DCT  coefficients  for  Baboon  image. 
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store  the  DCT  and  the  DFT  coefficients  to  ensure  the  desired  precision.  The  DCT 
coefficients  and  the  DFT  coefficients  are  then  used  as  inputs  to  the  proposed  IDCT 
and  IDFT  algorithms,  respectively.  The  outputs  are  compared  to  the  data  that  are 
used  to  produce  the  DCT  and  the  DFT  coefficients.  As  expected,  they  show  perfect 
match  in  every  test. 

6.3  Performance  Test  of  the  Proposed  Algorithms 

Programs  in  C language  are  written  to  implement  the  proposed  IDCT 
algorithm.  Two  personal  computers  (486-33MHz  and  386-20MFIz)  are  used  to  run 
the  programs.  The  running  time  of  the  programs  is  recorded  as  a performance  index 
of  the  algorithm.  For  a performance  comparison,  Chen’s  fast  IDCT  is  also 
implemented  and  run  by  using  the  same  personal  computers. 

Only  the  8 X 8 IDCT  is  implemented.  Three  512  x 512  images  (Lena,  Boat, 
and  Baboon)  with  8-bit  precision  are  used  to  test  the  running  time  of  Chen’s  and  the 
proposed  fast  IDCTs.  The  results  are  shown  under  the  columns  of  Timel,  Time2, 
Time3,  and  Time4  in  Table  6.2,  Table  6.3,  and  Table  6.4,  respectively.  Timel  is 
obtained  by  using  the  proposed  algorithm  and  the  486-33MHz  personal  computer; 
Time2  is  obtained  by  the  use  of  Chen’s  algorithm  and  the  486-33MHz  personal 
computer;  Time3  is  obtained  by  using  the  proposed  algorithm  and  the  386-20MHz 
personal  computer;  Time  4 is  obtained  by  using  Chen’s  algorithm  and  the  386- 
20MHz  personal  computer.  Row  (a)  to  Row  (g)  of  Table  6.2,  Table  6.3,  or  6.4 
correspond  to  different  conditions  (sparsity)  of  input  matrices  to  be  used  for 
reconstructing  the  images  in  Figs.  6.1(a)-(g),  Figs.  6.2(a)-(g),  and  Figs.  6.3(a)-(g), 
respectively.  For  instance.  Row  (c)  of  Table  6.3  corresponds  to  the  condition  that  the 
first  three  nonzero  quantized  coefficients  along  the  zig-zag  scan  are  taken  to  be  the 
only  three  nonzero  elements  in  the  8x8  input  matrices.  Under  this  condition,  the 
resulting  image  is  shown  in  Fig.  6.2(c). 
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Table  6.2  The  results  of  processing  image  file  Lena.512. 


BPP 

CR 

S/N 

NMSE 

Timel 

Time2 

Time3 

Time4 

a 

.0871 

91.2 

24.9 

.0179 

1.1 

3.9 

3.3 

11.8 

b 

.1888 

42.4 

27.2 

.0106 

1.7 

3.8 

4.5 

11.5 

c 

.2750 

29.1 

28.6 

.0077 

2.4 

4.0 

6.3 

11.3 

d 

.3470 

23.0 

29.1 

.0068 

2.8 

4.0 

7.8 

11.0 

e 

.4048 

19.8 

30.3 

.0052 

3.2 

3.9 

7.8 

10.8 

f 

.7003 

11.4 

36.6 

.0012 

5.2 

3.9 

13.5 

11.0 

g 

.4806 

16.6 

34.8 

.0018 

3.5 

4.0 

9.3 

12.0 

Table  6.3  The  results  of  processing  image  file  Boat.512. 


BPP 

CR 

S/N 

NMSE 

Timel 

Time2 

Time3 

Time4 

a 

.0824 

97.1 

22.1 

.0280 

1.2 

3.9 

3.3 

11.8 

b 

.1919 

41.7 

23.3 

.0212 

1.8 

3.9 

4.5 

11.8 

c 

.2943 

27.2 

24.8 

.0148 

2.5 

4.0 

7.0 

11.5 

d 

.3838 

20.8 

25.5 

.0128 

3.1 

4.0 

8.8 

11.8 

e 

.4684 

17.1 

26.2 

.0107 

3.7 

4.0 

10.5 

11.8 

f 

1.0531 

7.6 

33.5 

.0020 

7.5 

4.2 

20.3 

11.8 

g 

.7341 

10.9 

31.7 

.0031 

5.0 

4.1 

13.0 

11.8 

Table  6.4  The  results  of  processing  image  file  Baboon.RED. 


BPP 

CR 

S/N 

NMSE 

Timel 

Time2 

Time3 

Time4 

a 

.0892 

89.7 

20.1 

.0957 

1.2 

4.0 

3.5 

11.3 

b 

.2049 

39.0 

20.6 

.0840 

1.8 

3.9 

5.5 

11.5 

c 

.3160 

25.3 

21.3 

.0720 

2.5 

3.9 

6.5 

11.5 

d 

.4200 

19.0 

21.7 

.0653 

3.2 

4.0 

8.3 

12.0 

e 

.5214 

15.3 

22.1 

.0605 

4.0 

4.0 

10.8 

12.5 

f 

1.7825 

4.5 

27.8 

.0162 

12.3 

4.2 

31.5 

12.3 

g 

1.2235 

6.5 

26.1 

.0239 

8.0 

4.2 

21.0 

12.8 
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The  running  time  shown  in  Tables  6.2,  6.3,  and  6.4  is  in  second.  Based  on  the 
data  under  Time2  and  Time4  of  these  tables,  the  running  time  of  the  traditional  fast 
inverse  transform  algorithms  is  essentially  a constant.  On  the  other  hand,  the  running 
time  of  the  proposed  algorithm  is  roughly  proportional  to  the  number  of  nonzero 
elements  in  the  input  matrices.  It  is  found  that  if,  on  the  average,  an  8 x 8 matrix 
consists  of  less  than  5 nonzero  transformed  coefficients,  the  proposed  algorithm  can 
outperform  Chen’s  algorithm.  This  implies  that  for  PIT  if  less  than  five  nonzero 
quantized  coefficients  in  an  8 x 8 image  block  are  sent  for  one  stage  of  image 
reconstruction,  the  proposed  algorithm  is  better.  For  example,  consider  the  case  when 
one  nonzero  coefficient  of  the  8x8  image  block  is  sent  for  one  stage  of  image 
reconstruction.  Suppose  the  Lena  image  is  transmitted  progressively  as  shown  in  Figs. 
6.1(a)-(e).  In  this  case,  the  processing  time  of  IDCTs  is  saved  by  3.9  + 3.8  + 4.0  + 
4.0  + 3.9  - 3.2  = 16.4  (seconds)  if  the  486-33MHz  personal  computer  is  used  or  11.8 
+ 11.5  + 11.3  + 11.0  + 10.8  - 7.8  = 48.6  (seconds)  if  the  386-20MHz  personal 
computer  is  used  (refer  to  Table  6.2).  Similarly,  to  send  the  Boat  image  and  the 
Baboon  image,  the  processing  time  of  IDCTs  is  saved  by  16.1  seconds  (486-33MHz) 
or  48.2  seconds  (386-20MHz)  and  15.8  seconds  (486-33MHz)  or  48.0  seconds  (386- 
20MHz),  respectively. 

Besides  the  time  measurements  given  in  Tables  6.2  to  6.4,  four  other 
parameters  associated  with  the  images  in  Figs.  6.1  to  6.3  are  also  shown  in  these 
tables.  The  first  parameter  is  BPP  or  bit  per  pixel  as  defined  in  Chapter  I of  this 
dissertation.  The  second  parameter  is  the  compression  ratio  (CR)  which  is  also 
defined  in  the  first  chapter  of  this  dissertation.  The  Huffman  code  table  used  to 
generate  the  data  for  these  two  parameters  is  a fbced  table  recommended  by  JPEG. 
The  table  can  be  found  in  Gonzalez  and  Woods  [1992].  The  third  parameter  is  the 
peak  signal  to  noise  ratio  (S/Np^g^)  defined  as  follows: 
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where  M and  N are  the  row  size  and  column  size  of  an  image,  respectively,  and  f 
represents  the  original  image  and  denotes  an  estimate  or  approximation  of  f^ 
that  results  from  compressing  and  subsequently  decompressing  the  image.  The  fourth 
parameter  is  the  normalized  mean  squared  errors  (NMSE)  defined  as 
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From  Tables  6.2  to  6.4,  note  that  both  BPP  and  S/N  are  monotonically  increasing 
and  both  CR  and  NMSE  are  monotonically  decreasing  as  more  nonzero  coefficients 
are  used  to  reconstruct  the  image.  This  result  is  typical  in  any  practical  PIT  schemes. 


CHAPTER  VII 

SUMMARY  AND  CONCLUSIONS 

7.1  Summary 

The  objective  of  this  study  is  to  develop  new  fast  algorithms  for  use  in  image 
decompression  and  to  show  that  the  developed  algorithms  outperform  traditional 
algorithms  in  computational  complexity.  Specifically,  this  study  focuses  on  a coding 
scheme  called  progressive  image  transmission  (PIT).  Using  such  a scheme,  images 
can  be  reconstructed  gradually  at  the  receiver  through  an  inverse  PIT  process. 

The  dissertation  was  presented  as  follows.  First,  a general  framework  by  which 
fast  inverse  PIT  can  be  developed  is  proposed.  Second,  based  on  the  framework,  fast 
inverse  discrete  cosine  transform  and  inverse  discrete  Fourier  transform  algorithms 
are  developed.  Finally,  the  computational  complexity  of  the  developed  algorithms  are 
compared  with  that  of  the  traditional  algorithms  through  analytical  and  simulation 
studies. 

It  has  been  shown  that  the  proposed  algorithms  are  very  comparable  to 
conventional  IDCT  and  IDFT  algorithms  in  computational  complexity  for  sparse 
matrices.  For  progressive  image  display,  where  the  input  matrbc  is  usually  highly 
sparse,  the  proposed  approach  can  outperform  them  in  a significant  margin.  The 
sparser  the  input  matrbc  is,  the  faster  our  approach  could  be.  This  feature  allows  the 
processing  time  to  be  proportional  to  the  activities  of  the  images  under  consideration. 
On  the  other  hand,  traditional  algorithms  have  a constant  processing  time  which  is 
not  adaptable  to  different  input  situations.  Furthermore,  it  does  not  have  the 
scanning  pattern  or  delay  time  problem  which  is  inherent  in  previous  approach  for 
fast  progressive  image  reconstruction. 
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12  Significance  of  This  Research 

A new  and  promising  approach  to  the  problem  of  transform-based  fast 
progressive  image  reconstruction  is  proposed  in  our  research.  The  idea  used  in  the 
approach  is  fundamentally  different  from  that  used  in  the  conventional  approach. 

The  analysis  indicates  that  the  developed  algorithms  have  lower  computational 
complexity  when  compared  to  traditional  algorithms.  The  main  reasons  are  that  the 
developed  algorithms  (1)  exploit  the  feature  that  for  most  images  the  input  matrices 
to  an  inverse  transform  are  very  sparse,  and  (2)  make  use  of  the  symmetrical 
property  of  the  basis  functions  of  the  transform.  The  algorithms  also  have  the 
following  features:  (1)  They  allow  transformed  coefficients  be  transmitted  in  any 
order,  including  the  zig-zag  scanning;  (2)  They  have  essentially  no  delay  time;  i.e.,  the 
image  reconstruction  can  be  initiated  as  soon  as  the  first  coefficient  is  received;  and 
(3)  The  images  can  be  reconstructed  progressively  with  any  degree  of  smoothness. 
These  features  make  the  algorithms  particularly  suitable  for  use  in  a runlength  based 
entropy  coding  scheme  such  as  the  one  employed  in  the  JPEG  standard. 

7.3  Recommendations  for  Future  Research 

Since  the  proposed  approach  is  new,  many  follow-up  studies  are  possible. 
Specifically,  the  following  two  studies  are  of  high  interest  in  the  future.  First,  consider 
the  hardware  implementation  of  the  proposed  algorithms.  There  are  VLSI 
implementations  of  traditional  DCT  and  IDCT  algorithms  (Rao  and  Yip  [1990]). 
With  modern  VLSI  technology,  the  proposed  algorithms  can  surely  be  hardware 
implemented,  but  how  they  can  be  efficiently  implemented  remains  to  be 
investigated.  Second,  consider  the  developments  of  the  fast  algorithms  other  than 
IDCT  and  IDFT.  It  is  shown  in  this  dissertation  that  many  commonly  used  transforms 
have  symmetrical  basis  functions.  We  have  only  develop  the  algorithms  for  two  of  the 
transforms,  namely,  IDCT  and  IDFT.  The  potential  of  developing  fast  algorithms  for 
the  rest  of  the  transforms  is  high.  Thus,  one  future  work  is  to  develop  the  fast 
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algorithms  for  some  of  those  transforms  and  compare  their  performances  with  their 
traditional  counterparts. 


APPENDIX  A 

A SUMMARY  OF  THE  JPEG  CODING  SCHEME 


There  are  four  modes  of  operation  in  the  JPEG  standard:  (1)  Sequential 
encoding:  each  image  component  is  encoded  in  a single  left-to-right,  top-to-bottom 
scan,  (2)  progressive  encoding:  the  image  is  encoded  in  multiple  scans  for 
applications  in  which  transmission  time  is  long,  and  the  viewer  prefers  to  watch  the 
image  build  up  in  multiple  coarse-to-clear  passes,  (3)  lossless  encoding,  and  (4) 
hierarchical  encoding:  the  image  is  encoded  at  multiple  resolutions  so  that  lower- 
resolution  versions  may  be  accessed  without  first  having  to  decompress  the  image  at 
its  full  resolution.  In  mode  1 operation,  to  view  an  image  at  browsing  stage,  all 
compressed  data  of  the  image  need  to  be  sent.  On  the  other  hand,  in  mode  2 
operation,  usually  only  part  of  the  compressed  data  need  to  be  sent.  In  mode  3 
operation,  it  is  not  efficient  to  browse  through  images  due  to  its  low  compression 
factor.  However,  it  can  be  used  when  the  image  of  interest  has  been  determined  at 
the  browsing  stage  and  the  user  requires  a perfect  reconstruction  of  the  image.  Mode 
4 operation  requires  filtering,  up/down  sampling,  and  differential  encoding  by  mode 
1,  2,  or  3.  Both  mode  2 and  mode  4 are  suitable  for  the  telebrowsing  of  image 
databases  at  the  browsing  stage.  But  mode  2 is  preferred  because  it  is  much  simpler 
to  implement  than  mode  4.  Therefore,  only  mode  2 of  the  JPEG  standard  will  be 
discussed  further.  Since  mode  2 operation  consists  of  the  same  Forward  DCT 
(FDCT)  and  quantization  steps  that  are  used  by  mode  1 operation,  mode  1 operation 
will  also  be  summarized. 
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A.1  Sequential  Mode  Codec  in  JPEG  Standard 

Figs  A.l  and  A.2  show  the  key  processing  steps  which  are  the  heart  of  the 
DCT-based  operations.  As  inputs  to  the  encoder,  source  image  samples  with  p bits 
precision  are  grouped  into  8x8  blocks,  and  the  mean  level  was  shifted  from 
unsigned  integers  with  range  [0,  2^-1]  to  signed  integers  with  range  [-2^^,  2P"^-1]. 
FDCT  is  then  performed  on  the  range  shifted  image  samples.  As  the  outputs  from 
the  decoder,  the  IDCT  generates  8x8  blocks  of  reconstructed  images.  The  FDCT 
and  IDCT  for  8 x 8 blocks  are  defined  as  follows: 
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where  x and  y are  spatial  coordinates  and  u and  v are  spatial  frequency  coordinates 


and  C„  and  C^  are  defined  as  follows: 


(A.3) 


C^=l  for  r * 0 
=—  for  r=0 

The  output  of  the  FDCT  is  the  DCT  coefficients.  The  coefficient  with  zero 


frequency  (u=v=0)  in  both  dimensions  is  called  the  "DC  coefficient"  and  the 
remaining  63  coefficients  are  called  the  "AC  coefficients."  The  range  shift  of  the 
source  image  samples  only  affects  the  DC  coefficients  (see  Appendix  B for 
mathematical  analysis).  This  extra  effort  of  data  shifting  is  probably  to  reduce  the 
risk  of  overflow  for  storing  DC  coefficients. 

After  the  output  from  the  FDCT,  each  of  the  64  DCT  coefficients  is  uniformly 
quantized  in  conjunction  with  a 64-element  quantization  table.  Each  element  can  be 
any  integer  value  between  1 and  255,  which  specifies  the  step  size  of  the  quantizer 
or  its  corresponding  DCT  coefficient.  The  goal  of  this  processing  step  is  to  discard 
information  which  is  not  visually  significant. 
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8x8  blocks  DCT-Based  encoder 


Fig.  A.l  DCT-based  encoder  processing  steps. 


DCT-Based  decoder 


Fig.  A.2  DCT-based  decoder  processing  steps. 
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Quantization  is  defined  as  the  division  of  each  DCT  coefficient  by  its 


corresponding  quantizer  step  size,  followed  by  rounding  to  the  nearest  integer: 


F^y=Integer  Round 


\ 


uv 


IQ 


(A.4) 


UV  / 


where  Q„v  is  the  step  size  of  a uniform  quantizer  and  its  value  is  obtained  from  the 
(u,v)-th  position  of  the  quantization  table. 

This  output  of  eq.  A.4  is  entropy  encoded  and  sent  to  the  receiver.  At  the 
receiver,  entropy  decoding  is  performed  first,  and  followed  by  dequantization.  In  this 
case,  dequantization  is  a simple  multiplication  operation  which  obtains  an 
approximation  of  denoted  by 


F*^'=F*^*Q 

^ UV  ^ UV  >C  uv 


(A.5) 


The  output  of  eq.  A.5  is  then  used  as  input  to  the  IDCT.  Ideally,  when  the  aim  is  to 
compress  the  image  as  much  as  possible  without  visible  artifacts,  each  step  size 
should  be  chosen  as  the  perceptual  threshold  or  "just  noticeable  difference"  for  the 
visual  contribution  of  its  corresponding  cosine  basis  function.  These  thresholds  are 
also  functions  of  the  source  image  characteristics,  display  characteristics,  and  viewing 
distance.  For  applications  in  which  these  variables  can  be  reasonably  well  defined, 
psychovisual  experiments  can  be  performed  to  determine  the  best  thresholds.  The 
experiment  described  by  Lohscheller  [1984]  has  been  used  by  the  JPEG  members. 
The  basic  idea  of  the  experiment  is  as  follows.  An  image  consisting  of  only  uniform 
background  is  given  first.  Then  a stimulus  image  is  added  to  the  background  image 
gradually  until  the  difference  between  the  resulting  image  and  the  original 
background  image  is  just  noticeable.  Taking  the  DCT  of  the  final  stimulus  image 
gives  us  the  visibility  threshold.  After  extensive  experiments,  the  JPEG  suggests  a 
quantization  table  for  grayscale  images  or  the  luminance  part  of  the  color  images 
(Leger  et  al.  [1991],  Wallace  [1992]).  The  table  is  shown  in  Table  A.I. 
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Table  A.l  A quantization  table  recommended  by  JPEG  standard. 


16 

11 

10 

16 

24 

40 

51 

61 

12 

12 

14 

19 

26 

58 

60 

55 

14 

13 

16 

24 

40 

57 

69 

56 

14 

17 

22 

29 

51 

87 

80 

62 

18 

22 

37 

56 

68 

109 

103 

77 

24 

35 

55 

64 

81 

104 

113 

92 

49 

64 

78 

87 

103 

121 

120 

101 

72 

92 

95 

98 

112 

100 

103 

99 

After  quantization,  the  DC  coefficient  is  treated  separately  from  the  63  AC 
coefficients.  The  DC  coefficient  is  a measure  of  the  average  value  of  the  64  image 
samples.  Because  there  is  usually  strong  correlation  between  the  DC  coefficients  of 
adjacent  8x8  blocks,  the  quantized  DC  coefficients  are  encoded  as  the  difference 
of  the  DC  coefficients  between  the  current  block  and  previous  block.  The  DC 
coefficient  of  the  first  block  is  sent  directly.  Finally,  all  of  the  quantized  coefficients 
are  ordered  into  a "zig-zag"  sequence  shown  in  Fig.  2.5. 

The  final  DCT-based  encoder  processing  step  is  the  entropy  coding.  Two 
entropy  coding  methods  are  recommended  by  JPEG  standard  --  Huffman  coding  and 
arithmetic  coding.  In  general,  using  the  arithmetic  coding  can  produce  5-10%  better 
compression  than  using  the  Huffman  coding  (Wallace  [1992]).  However,  using  the 
arithmetic  coding  requires  more  extensive  computational  effort  (Wallace  [1992]). 

A.2  Progressive  Mode  Operation  in  JPEG  Standard 

There  are  two  complementary  methods  by  which  a block  of  quantized  DCT 
coefficients  may  be  partially  encoded.  In  the  first  method,  only  a specified  "band"  of 
coefficients  from  the  zig-zag  sequence  is  encoded  within  a given  scan.  This  method 
is  called  "spectral  selection,"  because  each  band  typically  contains  coefficients  which 
occupy  a lower  or  higher  part  of  the  spatial-frequency  spectrum  for  that  8x8  block. 
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In  the  second  method,  the  coefficients  within  the  current  band  are  not  encoded  to 
their  full  (quantized)  accuracy  in  a given  scan.  At  the  first  encoding,  the  N most 
significant  bits  (MSB)  of  the  coefficients  are  encoded,  where  N is  specified  by  users. 
The  less  significant  bits  (LSB)  are  subsequently  encoded.  This  method  is  called 
"successive  approximation."  These  two  methods  can  be  used  separately,  or  in  certain 
combinations.  Some  intuition  for  spectral  selection  and  successive  approximation  can 
be  obtained  from  Fig.  A.3.  The  quantized  DCT  coefficient  information  can  be  viewed 
as  a rectangle  for  which  the  axes  are  the  DCT  coefficients  (in  zig-zag  order)  and 
their  amplitudes.  Spectral  selection  slices  the  information  in  one  direction  and 
successive  approximation  in  the  other  direction. 

A DCT  and  quantization  example  for  JPEG  codec  is  shown  in  Fig.  A.4.  Note 
that  the  matrk  in  Fig.  A.4(d)  is  the  input  to  an  8 x 8 IDCT.  The  matrix  is  sparse  and 
its  nonzero  elements  occupy  only  the  first  few  places  along  the  ziz-zag  scanning 
sequence.  This  characteristic  is  generally  true  for  most  of  the  grayscale  or  color 
images  which  are  grouped  into  8x8  blocks.  When  image  telebrowsings  are 
considered,  not  all  nonzero  elements  need  to  be  sent.  This  results  in  an  even 
sparsermatrix  than  that  in  Fig.  A.4(d).  This  characteristic  will  be  exploited  to  reduce 
the  computational  effort  of  the  IDCTs  in  this  dissertation. 


Fig,  A.3  Spectral  selection  and  successive  approximation  for  PIT. 
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(b)  forward  DCT  coefficients 
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Fig.  A.4  DCT  and  quantization  examples. 


APPENDIX  B 

DCT  COEFFICIENT  CHANGES  WITH  RESPECT  TO 
A CONSTANT  SHIFT  OF  SOURCE  IMAGE  SAMPLES 


Let  F’uy  be  the  F„^  after  each  f is  shifted  by  m,  then 
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It  can  be  rewritten  as 
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The  double  summation  can  be  splitted  into  two  separate  summations: 
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The  two  summations  can  be  evaluated  independently: 
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If  u is  odd, 
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‘ 0 if  V * 0 

Using  the  results  of  equations  B8  and  B9,  we  can  rewrite  equation  B1  as 

P'uv  = - 8w/or  M=0,  v=0 


= otherwise 


(B.IO) 


This  confirms  that  the  range  shift  of  the  source  image  data  only  affects  the  DC 


coefficients. 


APPENDIX  C 

SYMMETRICAL  RELATIONSHIPS  OF 
THE  PARTIAL  CONTRIBUTION  FOR  DCT 


If  [fjuv  is  defined  as  the  partial  contribution  of  F^^  to  the  goal  matrix  [f^^  ], 


then  from  eq.  5.9,  we  have 
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We  will  show  that 
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where  R = cd[M,u],  S = cd[N,v],  and  cd[a,b]  is  the  common  divisor  of  a and  b. 
Let  f(x,u,M)  be  a function  defined  as 
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Eq.  C.6  can  be  rewritten  as 
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By  using  the  trigonometric  identity  cos(a  -)0 ) = cos(a  )cos()0 ) + sin(a  )sin(j0 ),  eq.  C.1 
can  be  expanded  as 
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If  u is  divisible  by  R,  or 


M = p/? 


(C.9) 


where  /3  is  an  integer,  the  sine  term  in  eq.  C.8  vanishes.  In  this  case,  eq.  C.8  becomes 
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From  eq.  C.5  and  the  identity  cos(/3n)  = (-lf,  eq.  C.IO  can  be  written  as 
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Now  we  can  start  to  show  eq.  C.2.  Eq.  C.l  can  be  written  in  terms  of  eq.  C.5  as 
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By  using  eq.  C.ll,  eq.  C.13  can  be  written  as 
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R 

is  a location  index  of  the  partial  contribution,  it  should  be  a nonnegative  integer  for 
practical  consideration.  Thus  M must  be  divisible  by  R,  or  equivalently 

M=aR  (C.15) 

From  eq.  C.9  and  eq.  C.15,  we  know  that  R = cd(M,u).  This  completes  the  proof  of 
eq.  C.2.  With  x,  u,  and  M replaced  by  y,  v,  and  N,  respectively,  we  can  show  eq.  C.3 
in  the  same  manner. 
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Eq.  C.4  can  be  derived  from  eq.  C.2  and  eq.  C.3  as  follows.  Substituting  y in 


eq.  C.2  by 
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will  result  in 
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By  using  eq.  C.3,  eq.  C.17  can  be  written  as 
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This  completes  the  proof  of  eq.  C.4. 
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